LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Introduction to Large Language Models

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

Comparing LLMs

Comparing LLMs

From the course: Introduction to Large Language Models

Start my 1-month free trial Buy for my team

Comparing LLMs

“

- [Instructor] How do we even compare large language models? That's a great question. I don't think we have a perfect answer yet, but we have made great progress over the last few months. Usually, we only focus on how good a model is at a task, but we don't know if that same model generates false information. So instead of just looking at one metric, a Stanford University research team proposed HELM, or the Holistic Evaluation of Language Models. With HELM, the Stanford research team worked together with the main large language model providers, and they were able to benchmark the models across a variety of data sets and get a more holistic view of model performance. The HELM benchmark is a living benchmark and should change as new models are released. I'll just cover the first couple of benchmarks and you can explore the rest further if you're interested. So let me just go ahead and scroll down a little bit. So here, each row…

Contents