6 min read

Deep Dive: What is a Neural Network? The Idea That Changed Everything.

Every AI system you've ever used runs on neural networks. Here's what they actually are, how they learn, and why a moment in 2012 changed everything — explained in plain English.
Deep Dive: What is a Neural Network? The Idea That Changed Everything.

In our last deep dive, we explained that ChatGPT works by predicting words — that it generates responses one word at a time based on patterns it learned during training. But we left a question hanging: how does a system learn patterns from billions of words of text? What is actually doing the learning?

The answer is a neural network. And once you understand what a neural network is, a lot of things that seem mysterious about AI suddenly make sense.

Start with the thing you already understand: your brain

Your brain contains roughly 86 billion neurons — cells that transmit electrical signals to each other. When you see something, hear something, or think something, what's actually happening is that specific patterns of neurons are firing in sequence, passing signals along connections to other neurons, which fire or don't fire depending on how strong the incoming signal is.

Here's what's remarkable about this system: it learns by changing the strength of those connections.

When you were a child and you first learned what a cat was, your brain didn't download a definition. It was exposed to many cats — fluffy ones, orange ones, small ones, grumpy ones — and over time, the connections between the neurons that process visual information and the concept of "cat" got stronger. The more you saw cats, the more reinforced those connections became.

That's it. That's the fundamental mechanism. Learning is connection strength changing over time in response to experience.

Neural networks in computers borrow this exact idea. Not the biology — the logic.

What an artificial neural network actually is

An artificial neural network is a mathematical system made of layers of simple units — called nodes or artificial neurons — connected to each other. Information flows through the layers, from input to output, getting transformed at each step.

Here's the simplest possible version to make it concrete.

Imagine you want to build a system that can look at a photo and tell you whether it contains a cat or not. You'd build a neural network with:

  • An input layer — each node represents a pixel in the image. A 100x100 pixel image has 10,000 pixels, so 10,000 input nodes.
  • Some hidden layers in the middle — nodes that process and combine the signals coming from the input layer.
  • An output layer — in this case, just two nodes. One for "cat," one for "not cat."

Information flows forward through the layers. Each node in a hidden layer receives signals from all the nodes in the previous layer, multiplies each signal by a number called a weight — which represents how important that connection is — adds them all up, and then decides whether to "fire" and pass a signal on to the next layer.

At the end, whichever output node has the strongest signal wins. Cat or not cat.

The weights are the key. They're the numbers that encode what the network has learned. A freshly initialized network has random weights — it would perform no better than a coin flip. A trained network has weights that have been carefully tuned so that the right patterns produce the right outputs.

How does a neural network actually learn?

This is where it gets interesting.

Training a neural network works like this:

You show it thousands of examples — in our cat example, thousands of photos labeled "cat" or "not cat." For each photo, the network makes a prediction. Then you check how wrong it was. Then you adjust the weights slightly to make it less wrong next time.

The mechanism for adjusting weights is called backpropagation — a mathematical technique that figures out, for each weight in the network, which direction to nudge it to reduce the error. A little more this way, a little less that way. Then you show it another example, check the error again, adjust again.

Do this millions of times across thousands of examples and something remarkable happens: the network starts getting it right.

What's happening internally — and this is the part that surprises people — is that the network is learning to detect features. Early layers learn to detect simple things: edges, colors, basic shapes. Later layers combine those simple detections into more complex ones: eyes, ears, fur textures. The final layers combine those complex features into the overall judgment: cat or not cat.

Nobody programmed those feature detectors. Nobody told the network to look for edges or ears. The network figured out on its own that these features are useful for making the distinction it's being trained to make.

This is what makes neural networks different from traditional programming. Traditional programming is explicit: you write rules ("if it has whiskers and pointy ears, it's probably a cat"). Neural networks are implicit: you show them examples and they figure out the rules themselves.

Why deep learning is called "deep"

You may have heard the term "deep learning" — it's essentially synonymous with modern AI. The "deep" refers simply to the number of layers.

Early neural networks had a handful of layers. Modern ones have hundreds or thousands. The depth is what allows them to learn incredibly complex, hierarchical representations of information.

A shallow network might learn to detect simple features. A deep network can learn features of features of features — building up from pixels to edges to shapes to objects to concepts, in a long chain of increasingly abstract representations.

It was the combination of deeper networks, much more data, and dramatically faster computers — specifically GPUs, the same chips used in video games — that unlocked the AI capabilities we have today. The ideas behind neural networks are actually quite old. What changed in the 2010s was that all three of those ingredients — depth, data, and compute — reached a threshold where suddenly things that had seemed impossible became possible.

The moment everything changed

In 2012, a neural network called AlexNet entered an annual computer vision competition called ImageNet — where systems try to identify which of 1,000 categories an image belongs to. The second-place system that year scored 26.2% — AlexNet beat it by nearly 11 percentage points.

AlexNet achieved 15.3%.

That's not a modest improvement. That's nearly cutting the error rate in half, in a single year, using a fundamentally different approach. The AI research community noticed immediately. Within a few years, deep learning had taken over every major AI benchmark, and the techniques behind AlexNet became the foundation of essentially everything that followed.

The protein folding breakthrough we mentioned in our drug discovery briefing last week — AlphaFold — was a neural network. The large language models behind ChatGPT and Claude — neural networks. The image recognition in your phone's camera — a neural network. The spam filter in your email — probably a neural network.

The 2012 ImageNet moment is one of the genuine inflection points in the history of technology. Before it, AI was a promising field with limited real-world applications. After it, the trajectory changed.

What neural networks can and can't do

Understanding the mechanism helps explain both the strengths and the limitations.

What they're good at: any task where the right answer can be learned from examples. Image recognition, language generation, translation, game-playing, pattern detection in data. If you can show a neural network enough examples of inputs paired with correct outputs, it can usually learn to generalize.

What they struggle with: tasks requiring explicit logical reasoning, strict rule-following, or handling situations completely outside their training distribution. A neural network that has learned to recognize cats might confidently misclassify a fox wearing a cat mask — because it has learned statistical patterns, not the underlying concept of "cat-ness."

This is the same reason language models hallucinate. They've learned the statistical patterns of how language works, which produces impressively coherent text. But they haven't learned to reason about truth the way humans do, which means those patterns sometimes produce confident nonsense.

What nobody fully understands: what's actually happening inside the hidden layers. When a neural network makes a decision, the reasoning is distributed across millions or billions of weights. There's no single node that says "I detected a cat ear." The representation is emergent and distributed in ways that are genuinely difficult to interpret, even for the researchers who built the systems.

This is the "black box" problem you may have heard about — and it's real. The fact that we don't fully understand how these systems reach their conclusions is one of the genuine open problems in AI, both scientifically and in terms of safety and accountability.

The short version

A neural network is a mathematical system loosely inspired by the brain, made of layers of connected nodes that pass signals to each other. It learns by adjusting the strength of connections between nodes — a process guided by showing it many examples and gradually reducing its errors.

Deep neural networks — with many layers — can learn extraordinarily complex patterns from data, which is why they power everything from image recognition to language generation to drug discovery.

The ideas aren't new. What's new is that we now have enough data, enough computing power, and deep enough networks to make them work in ways that genuinely change what's possible.

Understanding neural networks doesn't require a computer science degree. It requires one insight: that learning, whether in a brain or a computer, is fundamentally about connection strength changing in response to experience. Everything else follows from that.

Next in the deep dive series: what is machine learning — and how does it relate to AI, neural networks, and all the other terms you keep hearing?