kserve
by
kserve

Description: Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

View on GitHub ↗

Summary Information

Updated 2 hours ago

Added to GitGenius on June 5th, 2024

Created on March 27th, 2019

Open Issues & Pull Requests: 572 (+0)

Number of forks: 1,565

Total Stargazers: 5,673 (+0)

Total Subscribers: 69 (+0)

Issue Activity (beta)

Open issues: 445

New in 7 days: 4

Closed in 7 days: 4

Avg open age: 777 days

Stale 30+ days: 11

Stale 90+ days: 0

Recent activity

Opened in 7 days: 2

Closed in 7 days: 3

Comments in 7 days: 95

Events in 7 days: 203

Top labels

kind/bug (756)
stale-warning-1 (481)
kind/feature (479)
stale-warning-2 (469)
stale-warning-3 (438)
area/engprod (115)
kind/question (95)
stale-closed (95)

Most active issues this week

#5543 feat(llmisvc): make creation of GIE CRDs optional - 15 events / 3 comments
#5532 Aggregation Layer for LLMInferenceService Endpoints - 9 events / 3 comments
#2493 Add spelling check for KServe docs - 6 events / 2 comments
#3851 `ReplacePlaceholders` issue with hyphens - 6 events / 3 comments
#5728 Add annotation to opt out of llm-d-routing-sidecar on decode presets - 5 events / 1 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 11.9 days

Mean response time: 556.8 days

90th percentile: 1923.9 days

Tracked items: 1,351

Most active contributors

yuzisun - 669 events, 367 issues
sivanantha321 - 422 events, 248 issues
spolti - 298 events, 190 issues
Jooho - 171 events, 95 issues
pierDipi - 82 events, 42 issues

Related by overlapping contributors

Detailed Description

KServe is a standardized distributed inference platform designed to deploy generative and predictive AI models at scale on Kubernetes. Written primarily in Go, it serves as a Cloud Native Computing Foundation incubating project and provides a unified approach to model serving across multiple frameworks and deployment scenarios. The platform addresses the need for organizations to deploy both large language models and traditional machine learning models using a consistent, Kubernetes-native infrastructure.

For generative AI workloads, KServe offers optimized backends including vLLM and llm-d, with native support for Hugging Face models. The platform implements an OpenAI-compatible inference protocol to enable seamless integration with LLMs. GPU acceleration is built in with optimized memory management for large models, and the system includes intelligent model caching to reduce loading times and improve response latency. Advanced memory management features like KV cache offloading to CPU or disk allow handling of longer sequences efficiently. Request-based autoscaling is optimized specifically for generative workload patterns.

For predictive AI, KServe supports multiple frameworks including TensorFlow, PyTorch, scikit-learn, XGBoost, and ONNX. The platform provides intelligent routing between predictor, transformer, and explainer components with automatic traffic management. Advanced deployment patterns include canary rollouts, inference pipelines, and ensembles through InferenceGraph. Built-in model explainability features enable understanding of prediction reasoning through explanations and feature attribution. The system supports payload logging, outlier detection, adversarial detection, and drift detection for comprehensive monitoring. Cost efficiency is achieved through scale-to-zero capabilities on expensive resources when not in use.

Installation options reflect different operational needs. Standard Kubernetes installation provides a lightweight deployment but without canary deployment and request-based autoscaling with scale-to-zero. Knative installation enables serverless deployment for InferenceService by default. ModelMesh installation supports high-scale, high-density, and frequently-changing model serving use cases. KServe also integrates as an important addon component within Kubeflow, with specific deployment guides available for AWS and OpenShift Container Platform environments.

Community engagement around KServe is substantial, with GitGenius tracking 1351 issues and pull requests showing a median response latency of 285.1 hours. The most active contributors include yuzisun with 669 tracked events, sivanantha321 with 422 events, and spolti with 298 events. Bug-related issues represent a significant portion of activity with 353 tracked items labeled as kind/bug, while staleness management shows 403 and 388 items respectively across two warning categories. The repository maintains overlapping contributors with microsoft/vscode, kubeflow/kubeflow, and microsoft/typescript, indicating cross-project collaboration within the broader cloud-native and machine learning ecosystems.

kserve
by
kserve

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

kserve
by
kservekserve/kserve

Repository Details

kserve by kserve

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

kserve by kservekserve/kserve

Repository Details

kserve
by
kserve

kserve
by
kservekserve/kserve