mlops-stacks
by
databricks

Description: This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.

View on GitHub ↗

Summary Information

Updated 34 minutes ago

Added to GitGenius on January 4th, 2025

Created on July 18th, 2022

Open Issues & Pull Requests: 20 (+0)

Number of forks: 259

Total Stargazers: 694 (+0)

Total Subscribers: 29 (+0)

Issue Activity (beta)

Open issues: 9

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 245 days

Stale 30+ days: 9

Stale 90+ days: 7

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 8.7 hours

Mean response time: 53.3 days

90th percentile: 239.8 days

Tracked items: 24

Most active contributors

arpitjasa-db - 34 events, 13 issues
sw33zy - 7 events, 1 issues
deepakas - 6 events, 3 issues
mikekimlovelytics - 5 events, 1 issues
ShaylanDias - 4 events, 3 issues

Related by overlapping contributors

Detailed Description

The Databricks MLOps Stacks repository provides a customizable template and framework for initializing new machine learning projects on Databricks with production-ready best practices built in from the start. Written in Python, the project is designed to bridge the gap between data scientists who need to iterate quickly on ML code and operations engineers who need to establish CI/CD pipelines and manage ML resources at scale. The repository is currently in public preview and is documented at the Databricks dev tools bundles documentation site.

The core value proposition of MLOps Stacks is enabling rapid project initialization with three modular components already integrated. The first component is an example ML code structure that includes training pipelines and batch inference modules organized as unit-tested Python packages and notebooks, allowing data scientists to start iterating on ML problems without needing to refactor code into production-ready modules later. The second component is ML Resources as Code, which defines ML pipeline resources including training and batch inference jobs through Databricks CLI bundles, enabling teams to govern and audit ML resource changes through pull requests rather than ad-hoc UI modifications. The third component is CI/CD automation, supporting both GitHub Actions and Azure DevOps workflows that test and deploy ML code and resources automatically.

The repository implements a structured development workflow across three execution environments: development, staging, and production. Data scientists work in the dev environment, file pull requests that trigger unit and integration tests in an isolated staging workspace, and upon merge to the main branch, staging jobs automatically update with the latest code. Production deployments occur through a separate release branch process, allowing teams to control when code changes move to production on a scheduled basis. This separation of concerns enables data scientists to move quickly while maintaining production stability and auditability.

To use MLOps Stacks, users run the databricks bundle init command with the mlops-stacks template, which prompts for configuration parameters. The initialization process offers flexibility through three setup modes: CICD_and_Project for complete setup, Project_Only for data scientists getting started, and CICD_Only for machine learning engineers setting up CI/CD on existing projects. Required configuration includes the cloud provider (AWS, Azure, or GCP), CI/CD platform choice, staging and production workspace URLs, and branch naming conventions. The tool requires Python 3.8 or higher and Databricks CLI version 0.236.0 or later.

According to GitGenius activity tracking, the repository shows median issue and pull request response latency of 8.7 hours across 24 tracked items, indicating active maintenance. The most active contributor tracked is arpitjasa-db with 34 events, followed by sw33zy with 7 events and deepakas with 6 events. The repository has overlapping contributors with major projects including Microsoft's VSCode and TypeScript repositories as well as the Rust language project, suggesting it draws from and contributes to a broader ecosystem of infrastructure and tooling development. The project is classified across multiple domains including machine learning operations, pipeline automation, model lifecycle management, continuous integration and delivery, and scalable ML infrastructure, reflecting its comprehensive approach to operationalizing machine learning on Databricks.

mlops-stacks
by
databricks

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

mlops-stacks
by
databricksdatabricks/mlops-stacks

Repository Details

mlops-stacks by databricks

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

mlops-stacks by databricksdatabricks/mlops-stacks

Repository Details

mlops-stacks
by
databricks

mlops-stacks
by
databricksdatabricks/mlops-stacks