Quick Start¶
InferenceHub is an infrastructure layer provisioner that helps you to create and manage your Internal AI platform via cli.
This guide walks you through on how to use inferencehub cli as your platform provisioner.
Requirements¶
- Kubernetes cluster (see cloud provider support)
kubectlandhelminstalled locallygoinstalled (to build the CLI)LITELLM_MASTER_KEYenvironment variable set (must start withsk-)
Step 1 — Install cluster prerequisites¶
Skip this step if your cluster already has Gateway API CRDs, cert-manager, Envoy Gateway, and a Gateway resource.
Otherwise, run once per cluster:
python3 scripts/setup-prerequisites.py \
--cluster-name my-cluster \
--domain inferencehub.ai \
--environment staging \
--tls-email admin@inferencehub.ai
This installs: Gateway API CRDs, cert-manager, Envoy Gateway, GatewayClass, Gateway, and optionally the AWS Load Balancer Controller.
See Prerequisites for full options and flags.
Step 2 — Install the CLI locally¶
InferenceHub is written in Go. You can build and install it to your $GOPATH/bin (make sure your Go path is in your $PATH).
Verify installation:
Step 3 — Create a config file¶
Run all inferencehub commands from the project root (the directory that contains helm/ and scripts/).
Edit the generated inferencehub.yaml:
clusterName: my-cluster
domain: inferencehub.ai
environment: staging
namespace: inferencehub
cloudProvider: aws
gateway:
name: inferencehub-gateway
namespace: envoy-gateway-system
models:
bedrock:
- name: claude-sonnet
model: anthropic.claude-3-5-sonnet-20241022-v2:0
region: us-east-1
aws:
litellmRoleArn: "arn:aws:iam::123456789012:role/litellm-bedrock-role"
observability:
enabled: false
langfuse:
host: https://cloud.langfuse.com
publicKey: "${LANGFUSE_PUBLIC_KEY}"
secretKey: "${LANGFUSE_SECRET_KEY}"
# Optional: enable web search in OpenWebUI (deploys SearXNG in-cluster by default)
webSearch:
enabled: true
See Configuration Reference for the full schema.
Step 4 — Set environment variables¶
export LITELLM_MASTER_KEY="sk-your-secret-key"
# Optional — only needed if observability.enabled: true
export LANGFUSE_PUBLIC_KEY="pk-lf-..."
export LANGFUSE_SECRET_KEY="sk-lf-..."
The CLI auto-loads .env, .env.local, and ~/.inferencehub/.env.
Step 5 — Point DNS before installing¶
Do this before running install
cert-manager immediately attempts an HTTP-01 ACME challenge when you install. If your domain doesn't resolve to the load balancer yet, the challenge fails and cert-manager enters a backoff loop that requires manual intervention.
After the prerequisites script completes, get the NLB hostname:
kubectl get gateway inferencehub-gateway -n envoy-gateway-system \
-o jsonpath='{.status.addresses[0].value}'
Create a CNAME record in your DNS provider:
Verify propagation before continuing: