How Chatbots and Large Language Models Work

Hi, I'm Mira Morati, the Chief Technology Officer at OpenAI, the company that created ChatGPT. I really wanted to work on AI because it has the potential to improve almost every aspect of life and help us tackle really hard challenges.

Hi, I'm Crystal Valenzuela, CEO, and co-founder of Runway. Runway is a research company that builds AI algorithms for storytelling and video creation. Chatbots like ChatGPT are based on a new type of AI technology called large language models. Unlike a typical neural network which trains on specific tasks like recognizing faces or images, a large language model is trained on as much information as possible, such as everything available on the internet. It’s trained to then generate completely new information, like writing essays or poems, having conversations, or even writing code.

The possibilities seem endless but how does this work and what are its shortcomings? Let's dive in.

While a chatbot built on a large language model may seem magical, it works based on some really simple ideas. In fact, most of the magic of AI is based on very simple math concepts from statistics applied billions of times using fast computers. The AI uses probabilities to predict the text you want it to produce based on all the previous texts that it has been trained on.

Suppose that we want to train a large language model to read every play written by William Shakespeare so that it could write new plays in the same style. We'd start with all the texts from Shakespeare's plays stored letter by letter in a sequence. Next, we'd analyze each letter to see what letter is most likely to come next. After an "i," the next most likely letters that show up in Shakespeare plays are "s" or "n." After an "s," the next letters are "t," "c," or "h," and so on. This creates a table of probabilities.

With just this, we can try to generate new writing. We pick a random letter to start. Starting with the first letter, we can see what's most likely to come next. We don't always have to pick the most popular choice because that will lead to repetitive cycles. Instead, we pick randomly. Once we have the next letter, we repeat the process to find the next letter, and then the next one, and so on.

Okay, well that doesn't look at all like Shakespeare. It’s not even English but it’s a first step. This simple system might not seem even remotely intelligent, but as we build up from here, you'd be surprised where it can go. The problem in the last example is that at any point the AI only considers a single letter to pick what comes next. That's not enough context and so the output is not helpful.

What if we could train it to consider a sequence of letters like sentences or paragraphs to give it more context to pick the next one? To do this we don't use a simple table of probabilities; we use a neural network. A neural network is a computer system that is loosely inspired by the neurons in the brain. It is trained on a body of information and with enough training, it can learn to take in new information and give simple answers.

The answers always include probabilities because there can be many options. Now let's take a neural network and train it on all the letter sequences in Shakespeare's plays to learn what letter is likely to come next at any point. Once we do this the neural network can take any new sequence and predict what could be a good next letter. Sometimes the answer is obvious but usually, it’s not.

It turns out this new approach works better, much better. By looking at a long enough sequence of letters, the AI can learn complicated patterns and it uses those to produce all-new texts. It starts the same way with a starting letter and then uses probabilities to pick the next letter, and so on. But this time the probabilities are based on the entire context of what came before.

As you see, this works surprisingly well. Now, a system like ChatGPT uses a similar approach but with three very important additions.

First, instead of just training on Shakespeare, it looks at all the information it can find on the internet, including all the articles on Wikipedia or all the code on GitHub. Second, instead of learning and predicting letters from just the 26 choices in the alphabet, it looks at tokens which are either full words, word parts, or even codes.

The third difference is that a system of this complexity needs a lot of human tuning to make sure it produces reasonable results in a wide variety of situations while also protecting against problems like producing highly biased or even dangerous content. Even after we do this tuning, it’s important to note that this system is still just using random probabilities to choose words.

A large language model can produce unbelievable results that seem like magic, but because it’s not actually magic, it can often get things wrong. When it gets things wrong, people ask: Does a large language model have actual intelligence?

Questions about AI often spark philosophical debates about the meaning of intelligence. Some argue that a neural network producing words using probabilities doesn’t have real intelligence. But what isn’t under debate is that large language models produce amazing results with applications in many fields. This technology is already being used to create apps and websites, help produce movies and video games, and even discover new drugs. The rapid acceleration of AI will have enormous impacts on society and it's important for everybody to understand this technology.

What I’m looking forward to is the amazing things people will create with AI and I hope you dive in to learn more about how AI works and explore what you can build with it.

Keyword

AI
ChatGPT
Large Language Models
Neural Networks
Shakespeare
Probability
Training Data
Human Tuning
Applications of AI
Technology Impact

FAQ

Q: What are large language models?
A: Large language models are a new type of AI technology that are trained on vast amounts of information to generate new text, such as writing essays, poems, or code.

Q: How does a chatbot based on a large language model work?
A: It predicts the next word in a sequence based on the probabilities derived from all the previous texts it has been trained on.

Q: What are the three important differences in ChatGPT's approach?
A: ChatGPT is trained with extensive internet data, uses tokens instead of single letters, and requires extensive human tuning to produce reasonable and safe results.

Q: Does a large language model have actual intelligence?
A: This often sparks philosophical debates. Some argue it doesn’t have true intelligence, but it does produce highly useful results.

Q: What are some applications of large language models?
A: They are used in creating apps, websites, movies, video games, and even in discovering new drugs.