triton-inference-server/server

Description: The Triton Inference Server provides an optimized cloud and edge inferencing solution.

View on GitHub ↗Jump to charts ↓

Summary Information

Updated 3 hours ago

Added to GitGenius on June 3rd, 2026

Created on October 4th, 2018

Open Issues & Pull Requests: 904 (+0)

Number of forks: 1,816

Total Stargazers: 10,852 (+0)

Total Subscribers: 145 (+0)

Issue Activity (beta)

Open issues: 511

New in 7 days: 2

Closed in 7 days: 1

Avg open age: 564 days

Stale 30+ days: 494

Stale 90+ days: 464

Recent activity

Opened in 7 days: 2

Closed in 7 days: 1

Comments in 7 days: 1

Events in 7 days: 2

Top labels

question (121)
Enhancement (81)
bug (59)
investigating (41)
module: backends (25)
performance (20)
grpc (15)
module: platforms (13)

Most active issues this week

#8240 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 40692 - 1 events / 0 comments
#8884 Cuda SHM Module Cuda driver entry point API mismatch - 1 events / 1 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 28.7 hours

Mean response time: 16.7 days

90th percentile: 35.9 days

Tracked items: 740

Most active contributors

rmccorm4 - 318 events, 131 issues
whoisj - 226 events, 76 issues
yinggeh - 186 events, 70 issues
Tabrizian - 164 events, 71 issues
the-david-oy - 152 events, 47 issues

Related by overlapping contributors

Detailed Description

Triton Inference Server is an open source inference serving platform developed by NVIDIA that enables deployment of AI models across multiple frameworks and hardware platforms. The server supports deep learning frameworks including TensorRT, PyTorch, ONNX, and OpenVINO, as well as machine learning frameworks like RAPIDS FIL. It can run on NVIDIA GPUs, x86 and ARM CPUs, and AWS Inferentia, making it suitable for cloud, data center, edge, and embedded deployment scenarios.

The platform provides multiple execution engines called backends, with support for concurrent model execution and various batching strategies. Dynamic batching and sequence batching capabilities allow optimization for different query patterns, while implicit state management supports stateful models. The server enables model pipelining through ensemble models and Business Logic Scripting, allowing teams to create complex inference workflows. A Backend API allows developers to add custom backends and preprocessing or postprocessing operations, with support for Python-based backends.

Triton Inference Server exposes inference capabilities through HTTP/REST and gRPC protocols based on the community-developed KServe protocol. For in-process use cases, it provides C and Java APIs that allow direct linking into applications. The platform includes comprehensive metrics for monitoring GPU utilization, server throughput, and latency. Model configuration and repository management features enable explicit control over which models are available, with tools like the Model Analyzer supporting optimization through profiling.

The repository shows active maintenance and community engagement. Across 738 tracked issues and pull requests, the median response latency is 29.1 hours, with a mean of 401.2 hours. The most frequently labeled issues are questions with 121 occurrences, followed by enhancement requests with 81 and bug reports with 59. Top contributors rmccorm4, whoisj, and yinggeh have driven 318, 226, and 186 events respectively, indicating sustained development activity. The repository shares contributors with microsoft/vscode, microsoft/typescript, and rust-lang/rust, suggesting cross-project collaboration within the broader software ecosystem.

The codebase is primarily written in Python and is distributed under the BSD 3-Clause license. The current release version is 2.70.0, corresponding to the 26.06 container release on NVIDIA GPU Cloud. The main branch tracks development progress toward the next release. Documentation covers building and deploying Triton through Docker containers or from source, preparing models for serving, and configuring the server for various use cases. The project includes tutorials and examples for popular models like ResNet, BERT, and DLRM, with deployment examples for Kubernetes and Helm on GCP, AWS, and NVIDIA FleetCommand. Triton Inference Server is part of NVIDIA AI Enterprise and offers enterprise support through NVIDIA global support channels.

triton-inference-server/server

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

server
by
triton-inference-servertriton-inference-server/server

Repository Details

triton-inference-server/server

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

server by triton-inference-servertriton-inference-server/server

Repository Details

server
by
triton-inference-servertriton-inference-server/server