Deep Dive: How Does ChatGPT Actually Work? What's Happening When You Type a Question?
Let's start with what most people assume is happening when they type a question into ChatGPT.
The natural assumption is something like: you ask a question, ChatGPT searches through a massive database of information, finds the answer, and sends it back to you. Kind of like Google, but smarter. Maybe it has a giant encyclopedia in there somewhere and it's just really good at finding the right page.
That assumption is completely wrong.
What's actually happening is considerably stranger — and once you understand it, everything about AI becomes clearer. Why it's so impressive, why it sometimes confidently makes things up, why it can write poetry and debug code and explain quantum physics and still not know what day it is.
It all comes down to one thing: prediction.
What ChatGPT is actually doing
Here's the core of it, in plain English:
ChatGPT is predicting what word should come next. That's it. That's the whole trick.
You type a question. The model looks at every word you wrote, thinks about what word most plausibly comes after all of that, produces that word, then looks at everything including that new word and predicts the next one, and so on, one word at a time, until it decides to stop.
Every response you've ever gotten from ChatGPT — every explanation, every poem, every line of code, every answer to a hard question — was assembled one predicted word at a time.
That probably sounds too simple to explain anything impressive. We need to go one level deeper.
What does "predicting the next word" actually mean?
Here's where it gets interesting.
When we say the model is "predicting the next word," it's not guessing randomly. It's calculating, for every possible word in its vocabulary, the probability that word is the right one to come next given everything that came before it.
So if you've written "The capital of France is," the model assigns a very high probability to "Paris," a very low probability to "banana," and essentially zero probability to "running." It picks from those probabilities — usually, but not always, the most likely option.
Now here's the crucial question: where do those probabilities come from?
They come from training. The model was trained on an enormous amount of text — think hundreds of billions of words scraped from books, websites, articles, code repositories, forums, and more. During training, it processed all of that text and learned, statistically, which words tend to follow which other words in which contexts.
But it didn't just learn simple word pairings. It learned incredibly complex, deep relationships between ideas, concepts, facts, styles, and structures. It learned that when someone is writing a recipe, certain words follow certain other words. It learned that when someone is writing Python code, certain syntax patterns appear. It learned that when someone asks a historical question, answers tend to have certain structures.
The model compressed all of that learning into billions of numbers — called parameters — that encode, in a form no human could fully read or understand, an enormous amount of statistical knowledge about how language works.
When you ask it a question, it uses those numbers to calculate probabilities and generate a response, one word at a time.
But wait — how does predicting words produce actual knowledge?
This is the part that surprises most people.
The intuition is: if the model is just predicting words, how does it actually know things? How can it explain photosynthesis, or write working code, or answer a question about history?
The answer is that predicting words at sufficient scale, with sufficient data and sufficient model size, turns out to require learning something real about the world.
Think about it this way. To reliably predict what word comes next in a sentence about photosynthesis, you have to have some representation of what photosynthesis actually is. To predict the next line of valid Python code, you have to have some model of how Python syntax works. To predict the next sentence in a historical explanation, you have to have some understanding of historical facts and how they relate.
You can't reliably predict language about a subject without encoding something about the subject itself.
This was genuinely surprising to researchers. Nobody planned for language models to develop internal representations of knowledge about the world. It emerged as a side effect of learning to predict text really, really well at scale.
The token thing everyone mentions
You may have heard the word "token" and wondered what it means. Here's the short version.
When you type text into ChatGPT, it doesn't process it letter by letter or word by word. It breaks it into chunks called tokens, which are roughly syllables or short word fragments. The word "interesting" might become two tokens: "interest" and "ing." The word "AI" might be one token.
This matters because models have a limit on how many tokens they can process at once — their "context window." Think of it as working memory. Older models could only hold a few thousand tokens in mind at once. Modern ones can handle millions, which is why they can now read and reason about entire books or codebases in a single session.
Every token in your conversation — your questions, the model's previous answers, any documents you've uploaded — is taking up space in that context window. When the window fills up, older content starts to fall out of the model's "memory," which is why very long conversations sometimes feel like the model has forgotten what you said earlier.
Why it sometimes confidently makes things up
Now we can explain one of the most confusing things about these models: hallucination.
Sometimes ChatGPT — or Claude, or Gemini, or any other language model — states something confidently that is completely wrong. It invents a book that doesn't exist, cites a study nobody ever conducted, gets a historical date wrong, or describes a person who never existed.
Why does a model that's supposedly so intelligent do this?
Because it's not searching for truth. It's generating plausible-sounding text.
When it predicts the next word, it's asking "what word would be most likely here, given everything I've learned about language?" It's not asking "is this factually accurate?" Those are different questions, and the model is only directly optimized for the first one.
If the model has learned that a certain style of question tends to be followed by a certain style of confident-sounding answer, it will generate that confident-sounding answer — even if the specific content of that answer happens to be wrong.
This is a genuine limitation, not a bug that will be patched in the next update. It's structural. The way these models work means they're always going to have some tendency to generate plausible-sounding falsehoods, especially when asked about things they have limited training data on.
That's why you should always verify important claims from AI against a reliable source, especially for medical, legal, financial, or factual information where being wrong has real consequences.
Why it doesn't know what day it is
Another thing that confuses people: ChatGPT can explain Einstein's theory of relativity but doesn't know today's date unless you tell it.
This is because of how training works.
The model was trained on text up to a certain date — its "knowledge cutoff." After that point, it simply has no information. It's not connected to the internet in its base form. It doesn't have a clock. It has no way to know what's happened since training ended.
Everything it knows was baked in during training. The world kept moving after that. The model didn't.
This is also why the newest AI models sometimes feel slightly behind on current events, even when they're technically the most capable. Capability and currency are different things. A model can be extraordinarily capable at reasoning while still having a knowledge cutoff from six months ago.
Some versions of these tools — like ChatGPT with web browsing enabled — can search the internet to supplement their training. But that's a separate capability bolted on top, not part of the core language model itself.
Why this matters beyond the classroom
Understanding how these models work isn't just an interesting intellectual exercise. It has real consequences.
We covered a story recently about Google's Threat Intelligence Group catching hackers who used AI to build a zero-day cyberattack — a software exploit so new that no patch existed yet. The same prediction machinery that writes your emails was being used to find vulnerabilities in software systems. It's the same underlying capability, pointed in a different direction.
We also covered why Anthropic has been sitting on its most powerful model, Mythos, rather than releasing it broadly. The concern is that a sufficiently capable prediction engine — one that has learned enough about how software is written — can predict where the vulnerabilities are just as easily as it can predict the next word in a sentence. The capability is the mechanism. The use case depends on who's holding it.
Understanding that these systems are, at their core, extraordinarily sophisticated prediction engines helps explain both why they're so impressive and why the questions around them are so serious.
What this means for how you use it
Understanding this changes how you interact with these tools day to day.
Give it context. Because the model is predicting based on everything in its context window, the more relevant information you provide, the better its predictions will be. Don't just ask "write me a cover letter." Tell it your background, the job description, the tone you want, examples of your writing style. More context = better output.
Don't trust it blindly on facts. It's excellent at reasoning, explaining, writing, and synthesizing. It's less reliable as a factual reference, especially for specific numbers, dates, citations, and recent events. Treat it like a brilliant friend who occasionally misremembers things confidently.
Understand what "better" means. When a new model is released and described as "more capable," that usually means it's better at prediction — it's been trained on more data, with a better architecture, using better techniques. The underlying mechanism is the same. Better prediction, at larger scale, produces more impressive results. When Anthropic released Claude Opus 4.7 last month, the improvements in coding and vision came from exactly this: better prediction on more kinds of input, at higher resolution.
The short version
ChatGPT and models like it are extraordinarily sophisticated next-word predictors. They were trained on vast amounts of text and learned, as a side effect of that training, to encode complex knowledge about the world. When you ask a question, they generate a response one predicted word at a time, using billions of learned parameters to calculate what should come next.
The results can feel like talking to a knowledgeable person. In some ways, they are. In other ways — knowledge cutoffs, hallucination, no real-world awareness — they're something fundamentally different.
Understanding the difference is how you get the most out of these tools. And understanding the mechanism is how you start to understand the stakes.
This is part of the HumanReadable-AI Deep Dive series — longer pieces that explain the technology behind the headlines. Subscribe below to get the next one in your inbox.