LiteRT-LM
by
google-ai-edge

Description: LiteRT-LM is Google's production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices.

View google-ai-edge/LiteRT-LM on GitHub ↗

Summary Information

Updated 20 minutes ago

Added to GitGenius on April 23rd, 2026

Created on April 14th, 2025

Open Issues & Pull Requests: 341 (-1)

Number of forks: 567

Total Stargazers: 5,496 (+2)

Total Subscribers: 69 (+0)

Issue Activity (beta)

Open issues: 154

New in 7 days: 25

Closed in 7 days: 7

Avg open age: 38 days

Stale 30+ days: 86

Stale 90+ days: 23

Recent activity

Opened in 7 days: 24

Closed in 7 days: 7

Comments in 7 days: 25

Events in 7 days: 51

Top labels

feature request (1)

Most active issues this week

#2079 [Android][LiteRT-LM] NPU initialization fails on Snapdragon 8 Gen 2 (SM8550) – "No usable Dispatch runtime found" - 10 events / 1 comments
#2461 Gemma 4 12B LiteRT-LM model fails to load on Android with LiteRT-LM 0.13.0 - 7 events / 6 comments
#2502 Metal executor doesn't wire LiteRtTopKMetalSampler_SetInputTensorsAndInferenceFunc — on-GPU sampling unreachable on macOS at v0.12.0 - 6 events / 3 comments
#2451 Gemma4 model support for VPUX37XX (ArrowLake NPU 3720) in litert-lm backend=npu - 4 events / 2 comments
#2424 Feature request: Support thinking budget for reasoning models - 3 events / 0 comments

Explore full issue details

Detailed Description

LiteRT-LM is Google's open-source inference framework designed for deploying Large Language Models (LLMs) on edge devices. Its primary purpose is to enable high-performance, production-ready GenAI experiences directly on devices like smartphones, wearables, and IoT devices, thereby reducing latency, enhancing privacy, and enabling offline functionality. The framework is built to be cross-platform, supporting a wide range of operating systems and hardware configurations.

The core functionality of LiteRT-LM revolves around efficient model execution on resource-constrained devices. It achieves this through several key features. Firstly, it offers broad cross-platform support, including Android, iOS, web browsers, desktop operating systems (macOS, Windows, Linux), and IoT devices like the Raspberry Pi. This versatility allows developers to integrate LLMs into a diverse range of applications. Secondly, LiteRT-LM leverages hardware acceleration, utilizing both GPUs and NPUs (Neural Processing Units) to maximize performance. This optimization is crucial for achieving real-time or near-real-time inference speeds on edge devices.

Beyond basic LLM inference, LiteRT-LM provides advanced capabilities. It supports multi-modality, allowing it to process inputs beyond just text, including vision and audio. This opens up possibilities for applications that integrate image or audio understanding. Furthermore, the framework includes tool use functionality, enabling function calling for agentic workflows. This allows LLMs to interact with external tools and APIs, expanding their capabilities and enabling more complex tasks. LiteRT-LM also boasts broad model support, compatible with popular LLMs like Gemma, Llama, Phi-4, and Qwen, giving developers flexibility in choosing the best model for their needs.

LiteRT-LM is production-ready and powers on-device GenAI experiences in several Google products, including Chrome, Chromebook Plus, and Pixel Watch. The framework is also available through the Google AI Edge Gallery app, allowing users to readily run models on their devices. The repository provides a command-line interface (CLI) for easy experimentation and deployment, allowing users to quickly test and run models without writing code. The framework also offers language-specific APIs for Kotlin (Android), Python, and C++, with Swift support in development, simplifying integration into existing projects.

The repository includes comprehensive documentation, including technical overviews, CLI guides, and language-specific tutorials. Users can find performance benchmarks, model support information, and detailed instructions for getting started. The project also provides a "Quick Start" guide that allows users to run models directly from the terminal using the `uv` package manager, eliminating the need for complex setup procedures. The repository also highlights recent releases, showcasing new features and improvements, such as support for Gemma 4, enhancements to function calling, and the introduction of the LiteRT-LM CLI. The framework is actively developed and maintained, with regular updates and improvements to enhance performance, expand model support, and add new features.

LiteRT-LM
by
google-ai-edge

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week