Tools

Transformer FLOPs Calculator

Presets

Vocabulary Size

Context Length

Number of Layers

Model Dimension (d_model)

Number of Heads

FFN Dimension (d_ff)

Results

Total FLOPs:349.63B FLOPs

Per Layer Breakdown:

ComponentPer LayerAll Layers

Q, K, V Projections:3.62B (1.0%)43.49B (12.4%)

Q×K Attention:1.61B (0.5%)19.33B (5.5%)

Attention×V:1.61B (0.5%)19.33B (5.5%)

Output Projection:1.21B (0.3%)14.50B (4.1%)

Feed-Forward Network:14.50B (4.1%)173.95B (49.8%)

Total per Block:22.55B

All Blocks (12 layers):270.58B (77.4%)

LM Head:79.05B (22.6%)

Component Summary:

Attention (MHA):27.6%

Feed-Forward (FFN):49.8%

Language Model Head:22.6%