vit-pytorch
by
lucidrains

Description: Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

View on GitHub ↗

Summary Information

Updated 1 hour ago

Added to GitGenius on March 27th, 2026

Created on October 3rd, 2020

Open Issues & Pull Requests: 142 (+0)

Number of forks: 3,506

Total Stargazers: 25,410 (+0)

Total Subscribers: 154 (+0)

Issue Activity (beta)

Open issues: 81

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 1,321 days

Stale 30+ days: 81

Stale 90+ days: 80

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

enhancement (1)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 7.8 days

Mean response time: 399.1 days

90th percentile: 1215.6 days

Tracked items: 40

Most active contributors

lucidrains - 23 events, 11 issues
AmitMY - 5 events, 3 issues
HaloTrouvaille - 2 events, 1 issues
b5y - 2 events, 1 issues
dempsey-ryan - 2 events, 1 issues

Related by overlapping contributors

Detailed Description

The vit-pytorch repository is a PyTorch implementation of Vision Transformer, a neural network architecture that achieves state-of-the-art results in image classification using only a single transformer encoder. The repository provides clean, accessible code for the original Vision Transformer paper along with implementations of numerous subsequent variants and improvements proposed by the research community.

The core Vision Transformer implementation accepts standard parameters including image size, patch size, number of classes, transformer depth, attention heads, MLP dimension, and various dropout rates. The architecture divides images into patches and processes them through transformer blocks, supporting both CLS token pooling and mean pooling strategies for final classification. The repository explicitly notes that rectangular images should use the maximum of width and height as the image size, and that the number of patches must exceed 16.

Beyond the basic ViT, the repository contains implementations of numerous architectural variants documented in its comprehensive table of contents. These include SimpleViT, which incorporates recent simplifications like 2D sinusoidal positional embeddings and global average pooling; NaViT, which handles variable-length image sequences packed into single batches; and DistillableViT, which enables knowledge distillation from convolutional networks to vision transformers. The repository also implements Deep ViT with re-attention mechanisms for improved training at greater depths, CaiT with per-channel multiplication and layered attention patterns, and Token-to-Token ViT which downsamples image sequences through unfolding.

Additional variants span multiple architectural approaches: CCT uses convolutions instead of patching with sequence pooling, Cross ViT processes images at different scales with cross-attention, PiT applies depth-wise convolution pooling, LeViT incorporates convolutional embeddings and 2D relative positional biases, CvT mixes convolutions and attention across stages, Twins SVT combines local and global attention, RegionViT divides feature maps into local regions with regional tokens, and CrossFormer uses alternating local and global attention with dynamic relative positional bias. The repository also includes implementations for masked autoencoders, masked image modeling variants, adaptive token sampling, and 3D vision transformers.

The repository is written in Python and classified across multiple domains including Vision Transformer, PyTorch, Deep Learning, Computer Vision, Image Classification, Neural Networks, Transformer Models, and Attention Mechanisms. According to GitGenius activity tracking, the repository shows median issue and pull request response latency of 188.3 hours across 40 tracked items, though mean latency extends to 9579.3 hours indicating occasional longer-term discussions. The primary maintainer lucidrains has logged 23 tracked events, with AmitMY contributing 5 events and HaloTrouvaille contributing 2 events. The repository connects to other major projects including vllm-project/vllm, ionic-team/ionic-framework, and pytorch/pytorch through overlapping contributor networks, indicating its integration into the broader deep learning ecosystem.

vit-pytorch
by
lucidrains

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

vit-pytorch
by
lucidrainslucidrains/vit-pytorch

Repository Details

vit-pytorch by lucidrains

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

vit-pytorch by lucidrainslucidrains/vit-pytorch

Repository Details

vit-pytorch
by
lucidrains

vit-pytorch
by
lucidrainslucidrains/vit-pytorch