Deep Dive: The History of AI — From a 1950s Thought Experiment to the Thing on Your Phone
Here's something most people don't know: artificial intelligence is older than the internet, older than the personal computer, and older than most of the people writing about it today. And here you thought AI was just a flash in the pan! Shame on you...
The first serious conversations about thinking machines happened in the early 1950s. The researchers who started those conversations genuinely believed they were a few years away from cracking it. They were off by about 70 years — and even now, the debate about whether we've actually "cracked it" is very much alive.
The history of AI is not a straight line from idea to invention. It's a story full of extraordinary optimism, spectacular failure, long winters of abandonment, and a series of unexpected breakthroughs that nobody saw coming. Understanding that history makes the current moment make a lot more sense.
Let's start at the beginning.
The 1950s — The Idea Is Born
The story starts with a British mathematician named Alan Turing.
In 1950, Turing published a paper called "Computing Machinery and Intelligence" that opened with a question that's still being argued about today: Can machines think?
Turing didn't try to answer that question directly — he thought it was too vague to be useful. Instead he proposed what he called the "imitation game," which we now call the Turing Test: if a machine can carry on a conversation with a human and the human can't tell whether they're talking to a machine or a person, then for all practical purposes the machine is thinking.
It was a provocative idea. It was also, in 1950, completely theoretical. The computers of that era were room-sized calculating machines that could barely do arithmetic, let alone hold a conversation.
But Turing planted a seed.
Six years later, in the summer of 1956, a group of researchers gathered at Dartmouth College in New Hampshire for a workshop that would define the next several decades of computer science. The organizer was a young mathematician named John McCarthy, and it was here that he coined the term "Artificial Intelligence."
The proposal for the workshop included a line that, in retrospect, reads as either visionary or wildly overconfident depending on your mood: the researchers suggested that "every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it" — and that significant progress could be made in a single summer.
They did not crack AI in a single summer. But they did launch a field.
The 1960s — Early Optimism
The years after Dartmouth were genuinely exciting. Early AI programs were doing things that felt remarkable at the time.
A program called ELIZA, built at MIT in 1966, could hold simple conversations by pattern-matching what you said and reflecting it back as a question. If you typed "I am feeling sad," ELIZA might respond "Why do you say you are feeling sad?" It was a trick — ELIZA understood nothing — but people found it eerily convincing. Some users reportedly formed emotional attachments to it.
Sound familiar? ELIZA is, in many ways, the great-great-grandmother of ChatGPT.
Other programs of this era could solve algebra problems, prove mathematical theorems, and play checkers competitively. To researchers of the time, these felt like the first steps toward general machine intelligence.
The funding followed the optimism. The US government, particularly the military research agency DARPA, poured money into AI research. Researchers made bold predictions. Herbert Simon, one of the founders of the field, predicted in 1965 that "machines will be capable, within twenty years, of doing any work a man can do."
He was not correct.
The 1970s — The First AI Winter
By the mid-1970s, reality had caught up with the optimism.
The problem was that early AI systems were essentially very sophisticated rule-following machines. They could do impressive things in extremely narrow, controlled domains — proving theorems, playing games with fixed rules — but they fell apart the moment the real world got involved.
Real language is ambiguous. Real problems don't come with fixed rule sets. Real intelligence requires something these systems didn't have: common sense, context, the ability to handle situations they'd never encountered before.
A landmark report in 1973 — called the Lighthill Report in the UK — concluded bluntly that AI research had failed to deliver on its promises and that the "combinatorial explosion" of possibilities in real-world problems meant many of the field's goals were fundamentally unreachable with current approaches.
Funding dried up. Research programs were cancelled. The period that followed became known as the first AI Winter — a long cold stretch where interest, money, and optimism all retreated.
The 1980s — A Brief Thaw, Then Another Winter
AI didn't stay dead. It never does.
In the 1980s, a new approach called expert systems briefly revived the field. The idea was straightforward: instead of trying to build general intelligence, encode the specific knowledge of human experts — doctors, engineers, financial analysts — into a system that could apply that knowledge to solve problems.
These systems worked, within limits. Medical diagnosis programs could suggest diagnoses based on symptoms. Financial systems could flag suspicious transactions. Companies invested heavily, and for a while AI felt like a viable commercial technology.
But expert systems had a fundamental flaw: they were only as good as the rules their human creators could explicitly articulate. Knowledge that experts hold intuitively — the kind of thing a doctor knows from years of experience that they couldn't fully explain if you asked them — couldn't be captured in rules. The systems were brittle. They worked in controlled conditions and failed in messy real ones.
By the late 1980s, the commercial bubble had burst. Another AI Winter settled in. Japan's ambitious "Fifth Generation Computer" project, which had promised to build thinking machines by the 1990s, was quietly wound down. The field retreated again.
The 1990s — Quiet Progress
The second AI winter lasted into the early 1990s, but something important was happening underneath the surface.
Researchers had been quietly developing a different approach — one that didn't try to hand-code intelligence, but instead tried to have machines learn it from data. This approach, called machine learning, had been around in various forms since the 1950s, but it was beginning to mature.
The key insight was a technique called neural networks — computational systems very loosely inspired by how neurons in the brain connect and fire. By adjusting the connections between artificial neurons based on examples, these systems could learn to recognize patterns without being explicitly programmed with rules.
In 1997, something happened that captured the public's imagination: IBM's Deep Blue computer defeated world chess champion Garry Kasparov. It wasn't general intelligence — Deep Blue could play chess and nothing else — but it was a milestone that made the world pay attention.
And in 1998, a researcher named Yann LeCun demonstrated a neural network that could reliably read handwritten digits — the zip codes on envelopes. It was a small thing. It was also a glimpse of everything that was coming.
The 2000s — The Data Explosion
The internet changed everything, though the AI implications weren't obvious at first.
What the internet created, at enormous scale, was data. Billions of web pages. Millions of images. Countless hours of video. Decades worth of text in every language. And crucially, this data was digital — machine readable, available for analysis.
For machine learning systems that needed examples to learn from, this was oxygen.
Search engines like Google were quietly building some of the most sophisticated pattern-recognition systems in history — learning to understand what you were looking for from the words you typed, ranking billions of pages to find the most relevant one. This was AI, working at scale, even if it wasn't called that most of the time.
Meanwhile, in 2006, a researcher named Geoffrey Hinton published work that revived interest in neural networks with a technique called deep learning — using neural networks with many layers, each one learning increasingly abstract patterns from the data. The math had been around for decades. What had changed was the availability of data to train on and the computing power to process it.
The kindling was laid. It needed a spark.
2012 — The Breakthrough That Changed Everything
The spark came in 2012, at an annual competition called ImageNet where researchers competed to build systems that could correctly identify what was in a photograph.
For years the best systems hovered around a 25% error rate — getting roughly one in four images wrong. Then a team from the University of Toronto, led by Geoffrey Hinton and his students Alex Krizhevsky and Ilya Sutskever, entered a deep learning system called AlexNet.
It achieved a 15% error rate. The next best system that year was at 26%.
The gap was so large it didn't look like an incremental improvement. It looked like a different category of thing entirely.
The entire AI research community pivoted almost overnight. Deep learning — neural networks with many layers, trained on large datasets using powerful GPUs — became the dominant paradigm. Every major tech company launched AI research divisions. Universities restructured their computer science programs. The field has not looked back since.
2017 — The Transformer
Five years after AlexNet, a team at Google published a paper with a title that has aged remarkably well: "Attention Is All You Need."
The paper introduced a new neural network architecture called the Transformer. Without getting into the mathematics, the key innovation was a mechanism called "attention" that allowed the system to consider the relationship between every word in a sentence when processing any individual word — understanding context in a way previous systems couldn't.
Transformers turned out to be extraordinarily good at language. And because they could be scaled up — more data, more computing power, more parameters — they kept getting better in ways that didn't seem to plateau.
This is the architecture that powers every major AI language system today. GPT, Claude, Gemini — they're all Transformers at their core.
2022 — The Moment Everything Changed Publicly
On November 30, 2022, OpenAI released ChatGPT to the public.
It wasn't the most technically advanced AI system in existence at the time. But it was the first one that anyone could use, without technical knowledge, through a simple chat interface. Within five days it had a million users. Within two months it had a hundred million — the fastest consumer application to reach that milestone in history.
The AI that researchers had been building toward for 70 years had arrived in people's browsers and phones. The questions that had been academic — about what these systems could do, whether they could think, what they meant for jobs and education and society — suddenly became urgent and personal.
That's the moment we're still living in.
Where We Are Now
In the roughly three and a half years since ChatGPT launched, the pace of development has been startling even to people inside the field.
Systems that seemed impossibly advanced in 2022 are now considered baseline. Models can write code, analyze medical images, pass professional licensing exams, conduct research, generate video, and hold conversations that most people find indistinguishable from talking to a person.
The optimists of the 1950s weren't wrong about what was possible. They were just wrong about the timeline — and wrong about the approach. It didn't take explicit rules and hand-coded knowledge. It took scale: enormous amounts of data, enormous amounts of computing power, and the right mathematical architecture to make sense of it all.
The two AI winters weren't failures. They were necessary. The field learned what didn't work and why, and that knowledge shaped every breakthrough that followed.
The Short Version
AI started as a thought experiment in 1950, became a research field in 1956, failed twice, came back twice, quietly accumulated power through the 2000s on the back of the internet and cheap computing, had a fundamental breakthrough in 2012, got its key architectural innovation in 2017, and went mainstream in 2022.
Seventy years from Turing's paper to ChatGPT.
Next week's deep dive goes one level deeper: how do large language models — the specific type of AI behind ChatGPT, Claude, and Gemini — actually work? What's happening when you type a question and it answers? The answer is weirder and more interesting than you'd expect.
This is part of the HumanReadable-AI Deep Dive series — longer pieces that actually explain the technology behind the headlines. Subscribe below to get the next one in your inbox.