InferenceHub¶
Deploy and manage a self-hosted AI platform on Kubernetes.
What is InferenceHub?¶
InferenceHub standardizes the LLM infrastructure stack into a single, opinionated deployment — eliminating the need to manually wire together a chat UI(optional), API gateway, databases, caching, and observability. As of today this acts as an infrastructure layer provisioner that helps you to create and manage your Internal AI platform via cli.
InferenceHub is an infrastructure layer provisioner that helps you to create and manage your Internal AI platform via cli.
What's included¶
Application stack — versions are configurable via versions: in inferencehub.yaml:
| Component | Role |
|---|---|
| OpenWebUI | ChatGPT-style web interface |
| LiteLLM | OpenAI-compatible API gateway (2000+ providers) |
| PostgreSQL | Persistent storage for users, conversations, config |
| Redis | Session state (OpenWebUI) + API cache (LiteLLM) |
| SearXNG | Self-hosted web search engine (optional) |
Infrastructure — versions pinned by the prerequisites:
| Component | Version | Role |
|---|---|---|
| Envoy Gateway | v1.7.0 |
Kubernetes Gateway API implementation |
| cert-manager | v1.19.4 |
Automatic TLS via Let's Encrypt |
| AWS Load Balancer Controller | 3.1.0 |
NLB provisioning on AWS EKS (optional) |
| Langfuse | SaaS | LLM observability and cost tracking (optional) |
Cloud provider support¶
| Provider | Status | Notes |
|---|---|---|
| AWS EKS | Supported | TLS termination via Envoy Gateway, IRSA for Model access |
| GKE | Planned | Cloud Load Balancer, Workload Identity |
| AKS | Planned | Azure Load Balancer, Managed Identity |
| Local / kind | Best effort | No cloud-specific features; works for development |
Demo¶
License¶
Apache 2.0 — see LICENSE.
