Description: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
View karpathy/micrograd on GitHub ↗
`micrograd` by Andrej Karpathy is a concise implementation of a tiny autograd engine, designed to demystify the core mechanics of backpropagation and automatic differentiation. It serves as an educational tool, allowing users to build a foundational understanding of how modern deep learning frameworks like PyTorch or TensorFlow compute gradients. The repository’s primary goal is to illustrate the principles of a computational graph and the chain rule from first principles, using only scalar values and basic Python. It strips away complexities, presenting gradient computation's essence in few lines of code.
The `Value` object is `micrograd`'s core. Each `Value` instance encapsulates a single scalar number (`data`) and its corresponding gradient (`grad`), which accumulates during the backward pass. Crucially, every `Value` maintains references to its direct inputs (`_prev`) and the operation (`_op`) that produced it. This forms the explicit computational graph. For example, when two `Value` objects are added, a new `Value` is created, storing the sum, and its `_prev` set points back to the two original operands, along with a tag indicating the `+` operation.
The forward pass in `micrograd` is straightforward: mathematical operations like addition, multiplication, exponentiation, or activation functions (e.g., `tanh`) are overloaded for `Value` objects. When these operations are performed, they not only compute the result but also implicitly construct the computational graph. Each new `Value` object created during this process "remembers" its parents and the specific operation that generated it. This graph, built dynamically as computations unfold, is the essential structure upon which the backward pass operates.
The `backward()` method is `micrograd`'s true power. When called on the final output `Value` of a computation, it initiates gradient propagation. First, the output's gradient is initialized to 1.0. Then, `micrograd` performs a topological sort of the computational graph, ordering the `Value` objects from the output back to the inputs. It then iterates through this sorted list. For each `Value`, it invokes a special `_backward()` function associated with that `Value`. This `_backward()` function knows how to apply the chain rule for the specific operation that created the `Value`, distributing the incoming gradient (`grad`) to its immediate predecessors (`_prev`).
This elegant mechanism allows `micrograd` to compute gradients for arbitrarily complex expressions. The repository demonstrates this by building and training a tiny multi-layer perceptron (MLP). It shows how to define parameters as `Value` objects, perform a forward pass to get a loss, and then call `loss.backward()` to compute all necessary gradients. Subsequently, these gradients are used to update the parameters via simple gradient descent, illustrating the complete training loop of a neural network from first principles.
Ultimately, `micrograd` is an educational masterclass. Building this system from scratch offers profound insights into deep learning frameworks, revealing how the chain rule is systematically applied across a computational graph for efficient gradient computation—the bedrock of modern machine learning.
Fetching additional details & charts...