wesm/pydata-book

Description: Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media

View on GitHub ↗Jump to charts ↓

Summary Information

Updated 59 minutes ago

Added to GitGenius on August 28th, 2024

Created on June 30th, 2012

Open Issues & Pull Requests: 27 (+0)

Number of forks: 15,685

Total Stargazers: 24,735 (+0)

Total Subscribers: 1,508 (+0)

Issue Activity (beta)

Open issues: 20

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 709 days

Stale 30+ days: 20

Stale 90+ days: 20

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 3.5 days

Mean response time: 103.2 days

90th percentile: 375.1 days

Tracked items: 17

Most active contributors

wesm - 10 events, 5 issues
MaxforCherubim - 3 events, 3 issues
christianvye - 2 events, 1 issues
pouria-81 - 2 events, 1 issues
AldoOmarAndres - 1 events, 1 issues

Related by overlapping contributors

Detailed Description

The pydata-book repository contains the official materials and Jupyter notebooks accompanying "Python for Data Analysis, 3rd Edition" by Wes McKinney, published by O'Reilly Media. The repository serves as a comprehensive educational resource for learning data analysis with Python, providing executable notebook implementations of concepts covered in the book alongside the published text itself.

The repository is organized as a collection of Jupyter notebooks corresponding to each chapter of the book. Chapter coverage spans foundational Python concepts including language basics, IPython, and Jupyter notebooks in Chapter 2, followed by built-in data structures and functions in Chapter 3. The notebooks then progress through NumPy arrays and vectorized computation in Chapter 4, pandas fundamentals in Chapter 5, and practical data handling topics including loading, storage, and file formats in Chapter 6. Subsequent chapters address data cleaning and preparation, data wrangling operations like joins and reshaping, plotting and visualization, data aggregation and group operations, time series analysis, and an introduction to modeling libraries. An appendix notebook covers advanced NumPy topics, and Chapter 13 provides practical data analysis examples that synthesize earlier concepts.

The primary language of the repository is Jupyter Notebook, making it directly executable and interactive. The project includes setup instructions supporting multiple installation approaches. The recommended method uses uv, a fast Python package installer that automatically creates a virtual environment and installs dependencies from the pyproject.toml file. An alternative Conda-based setup is also documented. The project specifically uses pandas 2.0.3 to ensure compatibility across all notebooks.

The repository maintains historical versions for readers of earlier editions. A separate 2nd-edition branch contains reorganized materials for the 2012 publication, while a 1st-edition branch serves readers of the original 2012 release. This branching structure allows the repository to support multiple book versions simultaneously.

According to GitGenius activity tracking, the repository shows median issue and pull request response latency of 84.4 hours across 17 tracked items, with Wes McKinney as the primary maintainer accounting for 10 tracked events. Additional contributors include MaxforCherubim with 3 events and christianvye with 2 events. The repository's contributor network overlaps with major projects including microsoft/vscode, microsoft/typescript, and rust-lang/rust, indicating engagement from developers across diverse technical communities.

The code in the repository is released under the MIT license, making it freely available for educational and commercial use. The repository is classified across multiple domains including NumPy, data visualization, Jupyter notebooks, education, pandas, machine learning, data science, statistics, and scientific computing, reflecting its comprehensive coverage of the Python data analysis ecosystem. The materials provide both theoretical grounding and practical implementation examples, making the repository valuable for learners at various stages of data analysis skill development.

wesm/pydata-book

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

pydata-book
by
wesmwesm/pydata-book

Repository Details

wesm/pydata-book

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

pydata-book by wesmwesm/pydata-book

Repository Details

pydata-book
by
wesmwesm/pydata-book