data-engineering-zoomcamp
by
DataTalksClub

Description: Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

View on GitHub ↗

Summary Information

Updated 2 hours ago

Added to GitGenius on January 31st, 2026

Created on October 21st, 2021

Open Issues & Pull Requests: 0 (+0)

Number of forks: 8,561

Total Stargazers: 43,380 (+1)

Total Subscribers: 615 (+0)

Issue Activity (beta)

Open issues: 1

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 0 days

Stale 30+ days: 1

Stale 90+ days: 0

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

timecodes (28)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 3.5 hours

Mean response time: 31.1 days

90th percentile: 98.1 days

Tracked items: 23

Most active contributors

alexeygrigorev - 24 events, 15 issues
4-han - 1 events, 1 issues
Copilot - 1 events, 1 issues
JTFERN - 1 events, 1 issues
Kishorsenthilkumar - 1 events, 1 issues

Related by overlapping contributors

Detailed Description

Data Engineering Zoomcamp is a free 9-week course designed to teach the fundamentals of data engineering through building an end-to-end data pipeline from scratch. The course is structured around hands-on workshops and a final project, providing practical experience with industry-standard tools and best practices. The next cohort begins in January 2026, though the course also offers a self-paced option available anytime for learners who prefer flexibility.

The curriculum spans seven modules plus workshops covering the complete data engineering stack. Module 1 focuses on containerization and infrastructure as code using Docker, Docker Compose, and Terraform with Google Cloud Platform. Module 2 introduces workflow orchestration using Kestra. A dedicated workshop covers data ingestion patterns including API reading and incremental loading. Module 3 teaches data warehousing with BigQuery, including partitioning and clustering strategies. Module 4 covers analytics engineering and dbt for data transformation with both DuckDB and BigQuery. Module 5 addresses building end-to-end data pipelines using Bruin. Module 6 introduces Apache Spark for batch processing, covering DataFrames, SQL, and join internals. Module 7 focuses on streaming with Kafka, Kafka Streams, KSQL, and Avro schema management.

The repository is primarily composed of Jupyter Notebooks and serves as the central hub for course materials, with video lectures available on YouTube and a dedicated course platform at courses.datatalks.club for managing deadlines and homework. The course targets developers, analysts, and data scientists with basic coding experience and SQL familiarity, though Python experience is helpful but not required. No prior data engineering experience is necessary.

Live cohort participants receive graded homework, access to a leaderboard, peer review opportunities, and certificates upon completion of the final project. Self-paced learners can access all materials and complete homework for self-checking but do not receive certificates or leaderboard participation. The course emphasizes community engagement through a dedicated Slack channel at #course-data-engineering and Telegram announcements.

According to GitGenius activity tracking, the repository shows a median issue and pull request response latency of 3.5 hours across 23 tracked items, indicating active maintenance. Alexey Grigorev emerges as the most active contributor with 24 tracked events. The repository is classified across multiple data engineering domains including data pipelines, ETL, data warehousing, stream processing, batch processing, and orchestration. The course has generated significant community impact, with testimonials from graduates crediting it with landing their first tech jobs and providing foundational knowledge in data engineering principles. The repository welcomes pull requests and maintains active community guidelines for discussions and question-asking on its Slack channel.

data-engineering-zoomcamp
by
DataTalksClub

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

data-engineering-zoomcamp
by
DataTalksClubDataTalksClub/data-engineering-zoomcamp

Repository Details

data-engineering-zoomcamp by DataTalksClub

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

data-engineering-zoomcamp by DataTalksClubDataTalksClub/data-engineering-zoomcamp

Repository Details

data-engineering-zoomcamp
by
DataTalksClub

data-engineering-zoomcamp
by
DataTalksClubDataTalksClub/data-engineering-zoomcamp