From the course: Introduction to Large Language Models
Unlock the full course today
Join today to access over 24,700 courses taught by industry experts.
What are tokens?
From the course: Introduction to Large Language Models
What are tokens?
- [Instructor] Large language models generate text word by word, right? Not quite. They generate tokens. So what are tokens? Basically each word is split into sub words and one token corresponds to around four characters of text. Let's head over to the OpenAI website to get a good visual example of what tokens are. So this is the Tokenizer on the OpenAI website. So let me just go ahead and scroll down a bit. Now I'm going to go ahead and enter some text into the Tokenizer. So, tokenization is the process of splitting words into smaller chunks or tokens. Each of the different colors corresponds to a token. So in general, you can see that most words correspond to tokens, which includes the space in front of the word. There are a couple of exceptions. For example, the word tokenization is made up of two tokens, token and ization. The sentence I've typed has 12 words. Now, this corresponds to 14 tokens or 77 characters…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.