LiteLLM is an open-source AI gateway and Python SDK designed to simplify and unify access to over 100 large language model (LLM) APIs. Its primary purpose is to provide developers and enterprises with a single, consistent interface for interacting with a wide array of LLM providers—including OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure, Vertex AI, Cohere, HuggingFace, Sagemaker, VLLM, NVIDIA NIM, and many others—using the familiar OpenAI API format. This approach eliminates the complexity of managing different SDKs, authentication patterns, request formats, and error handling for each provider.
LiteLLM can be used in two main ways: as a Python SDK for direct integration into applications, or as a self-hosted proxy server (AI Gateway) that centralizes LLM access for teams and organizations. The proxy server acts as a drop-in replacement for the OpenAI API, allowing users to switch between providers without rewriting their code. This flexibility is particularly valuable for organizations seeking to optimize costs, performance, or compliance by leveraging multiple LLMs.
Key features of LiteLLM include unified API endpoints for chat completions, embeddings, image generation, audio transcription, batch processing, reranking, and more. The gateway supports advanced production-ready capabilities such as virtual API keys, spend tracking, guardrails for safety and compliance, load balancing across providers, and an admin dashboard for monitoring and management. LiteLLM is engineered for high performance, with benchmarks showing 8ms P95 latency at 1,000 requests per second, making it suitable for demanding enterprise workloads.
Beyond basic LLM access, LiteLLM offers support for agent-based workflows via its A2A protocol, enabling invocation of agents from providers like LangGraph, Vertex AI Agent Engine, Azure AI Foundry, Bedrock AgentCore, and Pydantic AI. This allows developers to build more complex, multi-step AI applications. The repository also includes MCP (Model Control Protocol) tools, which facilitate integration with MCP servers and enable advanced tool usage in OpenAI format, further expanding the versatility of the platform.
LiteLLM’s compatibility extends to a wide range of endpoints and providers, as detailed in its documentation and supported models list. The project is actively adopted by major organizations such as Stripe, Netflix, Google, and others, demonstrating its reliability and scalability in real-world production environments. Deployment is streamlined with support for platforms like Render and Railway, and the project offers extensive documentation, community support via Discord and Slack, and an enterprise tier for organizations with advanced needs.
In summary, LiteLLM is a robust, enterprise-ready solution for managing LLM access across multiple providers. It abstracts away the complexities of provider-specific APIs, offers high performance and scalability, and provides essential features for cost control, security, and operational management. Whether used as a Python SDK or a centralized proxy server, LiteLLM empowers developers and organizations to build, deploy, and manage AI applications efficiently and flexibly, making it a valuable tool in the rapidly evolving landscape of generative AI.