llama_index
by
run-llama

Description: LlamaIndex is the leading document agent and OCR platform

View on GitHub ↗

Summary Information

Updated 1 hour ago

Added to GitGenius on February 15th, 2024

Created on November 2nd, 2022

Open Issues & Pull Requests: 509 (+0)

Number of forks: 7,723

Total Stargazers: 50,754 (+2)

Total Subscribers: 281 (+0)

Issue Activity (beta)

Open issues: 191

New in 7 days: 4

Closed in 7 days: 3

Avg open age: 159 days

Stale 30+ days: 129

Stale 90+ days: 25

Recent activity

Opened in 7 days: 3

Closed in 7 days: 2

Comments in 7 days: 22

Events in 7 days: 61

Top labels

triage (3,301)
bug (2,601)
question (2,237)
enhancement (972)
docs (139)
discord (123)
P2 (109)
P1 (103)

Most active issues this week

#20183 [Feature Request]: Integration with Kiwix - 10 events / 1 comments
#20920 [Question]: Measuring hallucination rates in production systems - 9 events / 3 comments
#21213 Header-Aware Deterministic Chunking & Post-RAG Verification Pipeline - 8 events / 2 comments
#22254 Benchmarking LlamaIndex on KRB (Knowledge Retrieval Benchmark) — looking for a config - 7 events / 3 comments
#22248 [Bug]: Agent state prompt is stale after tools update workflow state - 5 events / 3 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: N/A

Mean response time: 22.8 days

90th percentile: 33.9 days

Tracked items: 4,104

Most active contributors

logan-markewich - 4,527 events, 2,349 issues
AstraBert - 288 events, 168 issues
justinzyw - 197 events, 50 issues
mirallm - 145 events, 35 issues
masci - 120 events, 72 issues

Related by overlapping contributors

Detailed Description

LlamaIndex is an open-source Python framework designed to build agentic applications that augment large language models with private data. The repository addresses a core challenge in LLM development: how to effectively integrate proprietary data sources with pre-trained language models to enhance their knowledge and reasoning capabilities. The framework provides a comprehensive toolkit for data ingestion, structuring, retrieval, and integration with LLM applications.

The project offers two primary installation paths for users. The starter package, llama-index, bundles core LlamaIndex with a curated selection of integrations for quick setup. The customized approach uses llama-index-core as a foundation, allowing developers to add specific integration packages from LlamaHub, which hosts over 300 integration packages supporting various LLM providers, embedding services, and vector store backends. This modular architecture enables developers to build applications tailored to their specific technology stack and requirements.

LlamaIndex provides essential components for building LLM applications: data connectors that ingest from diverse sources including APIs, PDFs, documents, and SQL databases; data structuring capabilities through indices and graphs; and advanced retrieval and query interfaces that process LLM prompts and return retrieved context with knowledge-augmented responses. The framework is designed to accommodate both beginner users who can accomplish basic data ingestion and querying in five lines of code and advanced users who need fine-grained control over data connectors, indices, retrievers, query engines, and reranking modules.

The companion platform LlamaParse extends LlamaIndex's capabilities with enterprise-grade document processing. Parse provides agentic OCR and document parsing supporting over 130 file formats. Extract handles structured data extraction from documents, while Index manages ingestion, indexing, and retrieval-augmented generation pipelines. The Split feature divides large documents into subcategories, and the Agents component enables building end-to-end document agents using Workflows and Agent Builder.

GitGenius activity data reveals substantial community engagement with the repository. Across approximately 4100 tracked issues and pull requests, the median response latency is 0.0 hours with a mean of 547.9 hours, indicating rapid initial triage followed by variable resolution times. The most active labels are triage with 2280 occurrences, bug with 1711, and question with 1365, reflecting ongoing maintenance and user support. Logan-markewich leads contributor activity with 4525 tracked events, followed by AstraBert with 288 events and justinzyw with 197 events. The repository shares overlapping contributors with major projects including microsoft/vscode, microsoft/typescript, and rust-lang/rust, suggesting cross-pollination with established development communities.

The codebase emphasizes security and reproducibility through verification of build assets. The llama-index-core package includes a _static folder containing nltk and tiktoken caches to support environments with restrictive disk access permissions. The project uses GitHub's attest-build-provenance action to verify that cached files match their source versions, ensuring integrity and safety of distributed assets.

llama_index
by
run-llama

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

llama_index
by
run-llamarun-llama/llama_index

Repository Details

llama_index by run-llama

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

llama_index by run-llamarun-llama/llama_index

Repository Details

llama_index
by
run-llama

llama_index
by
run-llamarun-llama/llama_index