badlogic/pi

Description: CLI tool for managing vLLM deployments on GPU pods from Prime Intellect, Vast.ai, DataCrunch, etc.

View on GitHub ↗Jump to charts ↓

Summary Information

Updated 3 minutes ago

Added to GitGenius on June 2nd, 2026

Created on July 31st, 2025

Open Issues & Pull Requests: 1 (+0)

Number of forks: 13

Total Stargazers: 89 (+0)

Total Subscribers: 1 (+0)

Issue Activity (beta)

Open issues: 1

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 16 days

Stale 30+ days: 1

Stale 90+ days: 0

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 45.5 hours

Mean response time: 26.0 hours

90th percentile: 45.5 hours

Tracked items: 7

Most active contributors

Cedric20011203 - 8 events, 4 issues
gary149 - 2 events, 1 issues
luizribeiro - 2 events, 1 issues
miwgel - 1 events, 1 issues

Related by overlapping contributors

Detailed Description

The badlogic/pi repository is a command-line tool designed to simplify the deployment and management of vLLM instances on GPU pods from various cloud providers including Prime Intellect, Vast.ai, DataCrunch, and AWS. Written in JavaScript, the tool targets individuals who want to run large open-weight language models for coding assistant workflows without the complexity of traditional infrastructure management like Kubernetes or Docker.

The core functionality automates the entire setup process on clean Ubuntu GPU pods. When a user runs the setup command, the tool connects via SSH to a remote pod and automatically installs Python, CUDA drivers, vLLM, and all necessary dependencies. This zero-to-LLM-in-minutes approach eliminates manual configuration steps that would typically take much longer. The tool then manages the lifecycle of multiple vLLM instances running concurrently on a single pod, with each model instance receiving its own port and GPU allocation.

A key feature is intelligent GPU management on multi-GPU systems. The tool uses round-robin assignment to distribute models across available GPUs and supports tensor parallelism for large models that need to span multiple GPUs simultaneously. Users can view GPU assignments and manage models across multiple pods from a single machine without constantly switching contexts. The tool exposes an OpenAI-compatible API endpoint for each running model, allowing drop-in replacement of OpenAI API clients with automatic tool and function calling support.

The repository emphasizes practical considerations for cost-effective model deployment. The tool requires users to specify a persistent models path to avoid re-downloading large models on pod restarts, which can waste significant GPU time and money. For example, downloading a 140GB model like Llama-3.1-70B takes over 30 minutes, so persistent storage prevents this cost from recurring with each pod restart. The documentation provides provider-specific guidance for persistent storage paths across RunPod, Vast.ai, DataCrunch, Lambda Labs, and AWS.

GitGenius activity data shows the repository maintains responsive issue and pull request handling, with a median response latency of 45.5 hours across tracked items. The most active contributor is Cedric20011203 with 8 recorded events, followed by gary149 and luizribeiro with 2 events each. The repository shares contributors with related projects including huggingface/chat-ui, earendil-works/pi, and huggingface/huggingface.js, indicating integration within the broader HuggingFace ecosystem.

The tool supports advanced deployment scenarios including running OpenAI's GPT-OSS models with MXFP4 quantization, managing context windows and output token budgets, and configuring GPU memory allocation through the gpu_fraction parameter. Users can specify custom vLLM arguments for models with special requirements, such as Qwen3-Coder 480B which needs expert parallelism for mixture-of-experts support. The documentation includes concrete examples for different GPU types, such as A100 80GB scenarios and H200 configurations, helping users understand memory allocation and concurrent request capacity.

The tool explicitly defines its scope and limitations. It is not a pod provisioning manager, as users must create pods through their chosen provider's dashboard. It also does not aim to be a fully optimized enterprise deployment solution but rather targets individuals experimenting with large models who are constrained by local hardware. Privacy is a design consideration, with vLLM telemetry disabled by default. The tool requires Node.js 14 or higher to run on the user's local machine and a HuggingFace token for model downloads.

badlogic/pi

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

pi
by
badlogicbadlogic/pi

Repository Details

badlogic/pi

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

pi by badlogicbadlogic/pi

Repository Details

pi
by
badlogicbadlogic/pi