afrosoder

1 If there's Intelligent Life out There

Optimizing LLMs to be good at particular tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you acquire through links on our website, we may earn an affiliate commission. Here's how it works.

Hugging Face has actually launched its 2nd LLM leaderboard to rank the finest language models it has actually tested. The new leaderboard seeks to be a more tough consistent requirement for evaluating open large language design (LLM) performance throughout a variety of tasks. Alibaba's Qwen models appear dominant in the leaderboard's inaugural rankings, hikvisiondb.webcam taking 3 spots in the top 10.

Pumped to announce the brand brand-new open LLM leaderboard. We burned 300 H100 to re-run new examinations like MMLU-pro for all major open LLMs!Some knowing:- Qwen 72B is the king and Chinese open designs are controling total- Previous evaluations have become too simple for current ... June 26, 2024

Hugging Face's second leaderboard tests language models throughout 4 jobs: knowledge testing, thinking on very long contexts, complex mathematics abilities, and direction following. Six benchmarks are used to check these qualities, with tests including resolving 1,000-word murder secrets, explaining PhD-level questions in layman's terms, and a lot of overwhelming of all: high-school mathematics formulas. A full breakdown of the criteria used can be discovered on Hugging Face's blog.

The frontrunner of the new leaderboard is Qwen, Alibaba's LLM, which takes 1st, 3rd, and 10th location with its handful of versions. Also showing up are Llama3-70B, Meta's LLM, and [users.atw.hu](http://users.atw.hu/samp-info-forum/index.php?PHPSESSID=08c9144340b5268ba9925563d0384962&action=profile