01. What It Is
Reasoning models are a class of large language models that allocate significant compute at inference time by generating intermediate thinking steps, called thinking tokens or reasoning tokens, before producing a final visible response. This is distinct from standard language models, which produce output tokens in a single forward pass with no explicit internal deliberation.
The key innovation is where computation happens. Standard model improvements come from scaling training compute (bigger models, more data). Reasoning models add a third axis: test-time compute, meaning the model "thinks longer" on hard problems at the moment of inference. This paradigm was popularized by OpenAI's o1 model, released September 2024, and has since become mainstream across all major AI labs.
By 2026, inference workloads are projected to account for two-thirds of all AI compute, driven substantially by reasoning model adoption.