02. Timeline
1950: Turing's question
Alan Turing's 1950 paper "Computing Machinery and Intelligence" opened with the question "Can machines think?" Rather than attempting to define thinking, Turing proposed an operational test: if a machine can converse with a human via text and the human cannot reliably tell it is a machine, the machine can be said to think. This is the Turing test, and it reframed AI as an empirical rather than philosophical question.
Turing also outlined several prerequisites for machine intelligence, including learning from experience and the ability to handle language, which became the central research agenda.
1956: The Dartmouth Conference
The summer of 1956 at Dartmouth College is the conventional founding moment of AI as a research discipline. John McCarthy, Marvin Minsky, Claude Shannon, and Nathaniel Rochester organized a two-month workshop on the conjecture that "every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it." McCarthy coined the term "artificial intelligence" for the occasion.
The workshop produced no single breakthrough, but it gathered the people who would define the field for the next two decades and established symbolic manipulation as the primary method. Key outputs of the participants in subsequent years included the Logic Theorist and General Problem Solver (Newell and Simon), LISP (McCarthy), and work on neural networks (Minsky).
1957-1969: Early optimism and the perceptron
The late 1950s saw genuine excitement. Frank Rosenblatt's perceptron (1957) was a hardware learning machine that could recognize simple patterns. Herbert Simon and Allen Newell predicted in 1958 that a computer would be the world's chess champion within ten years and would prove new mathematical theorems.
These predictions proved dramatically wrong. In 1969, Minsky and Papert's book "Perceptrons" demonstrated mathematically that single-layer perceptrons could not solve the XOR problem, a simple non-linear classification. This was overgeneralized into a claim that neural networks could not represent complex functions, which contributed to a decade of underfunding for neural network research.
First AI winter, approximately 1974-1980
The gap between early promises and actual capabilities became undeniable by the early 1970s. DARPA, which had funded much of the optimism, became frustrated with the Speech Understanding Research program at Carnegie Mellon. The UK Lighthill Report (1973) assessed AI research as having produced little of practical value and led to severe funding cuts across British universities.
The core problem was combinatorial explosion. Symbolic AI systems could reason correctly over small, well-defined domains but fell apart when the number of possible states grew large. No one had a principled solution. Funding dried up.
1980s: Expert systems and the second wave
The 1980s brought a commercial resurgence built on expert systems. These were narrow systems encoding the rules of a specific domain, written by interviewing human experts. R1/XCON at Digital Equipment Corporation configured computer orders and was saving the company $40 million per year by 1986. MYCIN diagnosed bacterial infections with accuracy matching specialists. The market for AI software grew to over $1 billion annually.
Japan's Fifth Generation Computer Systems project (1982-1992) was a government-backed effort to build AI hardware and logic programming systems that would lead to human-level AI. The US and UK launched their own responses. Enthusiasm reached a peak.
Second AI winter, approximately 1987-2000
Expert systems hit a maintenance wall. Every new fact and exception had to be encoded by hand. The knowledge acquisition bottleneck, interviewing experts and translating their intuitions into explicit rules, was slow, expensive, and often produced brittle systems that failed on edge cases. The LISP machine market collapsed when cheaper general-purpose workstations made dedicated AI hardware uneconomical. The Fifth Generation project failed to meet its goals. Funding collapsed again.
This period is also when neural networks quietly began to recover. Geoff Hinton, David Rumelhart, and Ronald Williams published the backpropagation algorithm in 1986, showing how to train multi-layer networks. Yann LeCun applied it to handwritten digit recognition at Bell Labs, producing LeNet (1989), which could read ZIP codes from handwriting. The work was largely ignored outside small research communities.
Late 1990s and 2000s: Machine learning resurgence
The late 1990s saw AI rebranded as "machine learning" and "data mining," partly to escape the taint of the AI winters. Support vector machines (SVMs), decision trees, and probabilistic graphical models produced practical results without requiring hand-coded rules. Statistical approaches replaced symbolic ones as the dominant paradigm.
IBM's Deep Blue defeated world chess champion Garry Kasparov in 1997. This was celebrated as an AI milestone but was largely brute-force search with hand-tuned evaluation functions rather than learning. In 2011, IBM's Watson defeated champions on Jeopardy, a task requiring natural language understanding, and was a genuine advance.
GPU computing began to change the economics of neural network training. Nvidia's CUDA platform (2007) made it practical to run thousands of parallel computations on gaming hardware, which turned out to be exactly the operation that neural networks require.
2012: AlexNet and the deep learning moment
In the 2012 ImageNet Large Scale Visual Recognition Challenge, a convolutional neural network called AlexNet, built by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto, achieved a top-5 error rate of 15.3%. The next-best entry scored 26.2%. The gap was shocking. It was not an incremental improvement; it was a discontinuity.
AlexNet used deep convolutional layers trained on two GPUs, ReLU activations instead of sigmoid functions (which addressed the vanishing gradient problem), and dropout regularization. It was trained on 1.2 million labeled images from ImageNet.
This result triggered a reorientation of the entire field. Within two years, every major tech company had a deep learning team. Academic labs shifted funding. The neural network approach that had been dismissed after the perceptron critique of 1969 was now the dominant paradigm.
2014-2016: Generative models and AlphaGo
Ian Goodfellow's Generative Adversarial Network (GAN, 2014) introduced a new class of generative models: two networks, a generator and a discriminator, trained against each other, with the generator learning to produce realistic images. This was the conceptual foundation for later image generation tools.
DeepMind's AlphaGo defeated the world Go champion Lee Sedol in March 2016, 4 games to 1. Go had long been considered a benchmark task that would require human-level intuition to solve. AlphaGo combined deep neural networks with Monte Carlo tree search and reinforcement learning. The Go win attracted global attention and signaled that deep learning could tackle problems previously thought to require human judgment, not just pattern recognition.
2017: The Transformer
Vaswani et al.'s "Attention Is All You Need" (Google Brain, 2017) introduced the Transformer architecture. The key innovation was self-attention: every token in a sequence attends to every other token, allowing the model to capture long-range dependencies that recurrent networks (LSTMs) struggled with. The paper dropped recurrence entirely, which also made training parallelizable across the sequence length.
The Transformer became the dominant architecture for NLP within a year and, by 2020-2021, for computer vision and audio as well. GPT, BERT, T5, and effectively every major language model since 2018 is a Transformer variant.
2018-2021: The large language model buildup
OpenAI's GPT (2018) was a Transformer pre-trained on large text corpora with a simple objective: predict the next token. GPT-2 (2019) demonstrated that scaling this approach produced surprisingly capable text generation and was initially withheld from release due to misuse concerns. GPT-3 (2020) with 175 billion parameters showed dramatic few-shot capability: the model could perform new tasks given just a handful of examples in the prompt, without any gradient updates. This was a qualitative shift.
BERT (Google, 2018) showed that a Transformer trained bidirectionally with a masked language modeling objective produced powerful representations for downstream NLP tasks, making it the standard for classification and question-answering systems.
The scaling hypothesis emerged clearly during this period: more parameters, more data, and more compute consistently improved capability, with no obvious ceiling in sight. This justified a large increase in compute expenditure.
November 2022: ChatGPT and the inflection point
OpenAI released ChatGPT on November 30, 2022. It was a fine-tuned version of GPT-3.5 with Reinforcement Learning from Human Feedback (RLHF) to make it helpful, harmless, and honest. The public-facing chat interface was trivially accessible: a browser, no API key, no technical knowledge required.
ChatGPT reached 1 million users in 5 days and 100 million users in 2 months, making it one of the fastest-adopted consumer applications in history. It demonstrated that a large language model could hold coherent multi-turn conversations, write working code, draft professional documents, explain scientific concepts, and handle a vast range of tasks without task-specific fine-tuning.
The release forced every major tech company to accelerate its own LLM programs. Google declared a "code red" internally. Microsoft began integrating GPT-4 into Bing and Office. The agentic AI era had begun for mainstream users.
2023-2026: The generative and agentic era
GPT-4 (March 2023) added multimodal input (images and text), substantially improved reasoning, and achieved near-expert performance on professional benchmarks (bar exam, medical licensing exam). Anthropic's Claude series, Google's Gemini, and Meta's open-weight Llama models all followed with competitive capabilities.
Key developments of this period:
- Retrieval-Augmented Generation (RAG, 2020, widely deployed 2023): Connecting LLMs to external knowledge sources to reduce hallucination and enable up-to-date answers.
- AI coding assistants (2022-2024): GitHub Copilot, Cursor, and similar tools demonstrated that LLMs could write production-quality code, fundamentally changing software development workflows.
- Real-time voice (2024): GPT-4o's native audio mode achieved under-300ms voice latency with natural prosody, enabling conversational AI agents.
- Agentic systems (2024-2026): LLMs executing multi-step tasks autonomously, calling tools, browsing the web, writing and running code, and coordinating with other agents.
- Context windows (2023-2026): Context lengths expanded from 4K tokens (GPT-3.5) to 200K (Claude 3) to 1M+ tokens (Gemini 1.5 Pro, Claude 3.5), enabling new applications over long documents and codebases.