Transformer FLOPs Calculator
Presets
GPT-2 Small
GPT-2 Medium
GPT-2 Large
GPT-2 XL
Vocabulary Size
Context Length
Number of Layers
Model Dimension (d_model)
Number of Heads
FFN Dimension (d_ff)
Results
Total FLOPs:
349.63B FLOPs
Per Layer Breakdown:
Component
Per Layer
All Layers
Q, K, V Projections:
3.62B (1.0%)
43.49B (12.4%)
Q×K Attention:
1.61B (0.5%)
19.33B (5.5%)
Attention×V:
1.61B (0.5%)
19.33B (5.5%)
Output Projection:
1.21B (0.3%)
14.50B (4.1%)
Feed-Forward Network:
14.50B (4.1%)
173.95B (49.8%)
Total per Block:
22.55B
All Blocks (12 layers):
270.58B (77.4%)
LM Head:
79.05B (22.6%)
Component Summary:
Attention (MHA):
27.6%
Feed-Forward (FFN):
49.8%
Language Model Head:
22.6%