LiteRT-LM is Google's open-source inference framework designed for deploying Large Language Models (LLMs) on edge devices. Its primary purpose is to enable high-performance, production-ready GenAI experiences directly on devices like smartphones, wearables, and IoT devices, thereby reducing latency, enhancing privacy, and enabling offline functionality. The framework is built to be cross-platform, supporting a wide range of operating systems and hardware configurations.
The core functionality of LiteRT-LM revolves around efficient model execution on resource-constrained devices. It achieves this through several key features. Firstly, it offers broad cross-platform support, including Android, iOS, web browsers, desktop operating systems (macOS, Windows, Linux), and IoT devices like the Raspberry Pi. This versatility allows developers to integrate LLMs into a diverse range of applications. Secondly, LiteRT-LM leverages hardware acceleration, utilizing both GPUs and NPUs (Neural Processing Units) to maximize performance. This optimization is crucial for achieving real-time or near-real-time inference speeds on edge devices.
Beyond basic LLM inference, LiteRT-LM provides advanced capabilities. It supports multi-modality, allowing it to process inputs beyond just text, including vision and audio. This opens up possibilities for applications that integrate image or audio understanding. Furthermore, the framework includes tool use functionality, enabling function calling for agentic workflows. This allows LLMs to interact with external tools and APIs, expanding their capabilities and enabling more complex tasks. LiteRT-LM also boasts broad model support, compatible with popular LLMs like Gemma, Llama, Phi-4, and Qwen, giving developers flexibility in choosing the best model for their needs.
LiteRT-LM is production-ready and powers on-device GenAI experiences in several Google products, including Chrome, Chromebook Plus, and Pixel Watch. The framework is also available through the Google AI Edge Gallery app, allowing users to readily run models on their devices. The repository provides a command-line interface (CLI) for easy experimentation and deployment, allowing users to quickly test and run models without writing code. The framework also offers language-specific APIs for Kotlin (Android), Python, and C++, with Swift support in development, simplifying integration into existing projects.
The repository includes comprehensive documentation, including technical overviews, CLI guides, and language-specific tutorials. Users can find performance benchmarks, model support information, and detailed instructions for getting started. The project also provides a "Quick Start" guide that allows users to run models directly from the terminal using the `uv` package manager, eliminating the need for complex setup procedures. The repository also highlights recent releases, showcasing new features and improvements, such as support for Gemma 4, enhancements to function calling, and the introduction of the LiteRT-LM CLI. The framework is actively developed and maintained, with regular updates and improvements to enhance performance, expand model support, and add new features.