LLM emergent behavior written off as 'a mirage' by study

Deutschland Nachrichten Nachrichten

LLM emergent behavior written off as 'a mirage' by study
Deutschland Neuesten Nachrichten,Deutschland Schlagzeilen
  • 📰 TheRegister
  • ⏱ Reading Time:
  • 64 sec. here
  • 3 min. at publisher
  • 📊 Quality Score:
  • News: 29%
  • Publisher: 61%

Large language models' surprise emergent behavior written off as 'a mirage'

Stanford's Schaeffer, Miranda, and Koyejo propose that when researchers are putting models through their paces and see unpredictable responses, it's really due to poorly chosen methods of measurement rather than a glimmer of actual intelligence.BIG-BenchOne test within BIG-Bench highlighted by the university trio is Exact String Match. As the name suggests, this checks a model's output to see if it exactly matches a specific string without giving any weight to nearly right answers.

It's a nuanced situation. Yes, larger models can summarize text and translate languages. Yes, larger models will generally perform better and can do more than smaller ones, but their sudden breakthrough in abilities – an unexpected emergence of capabilities – is an illusion: the smaller models are potentially capable of the same sort of thing but the benchmarks are not in their favor.

"Our alternative explanation," as the scientists put it,"posits that emergent abilities are a mirage caused primarily by the researcher choosing a metric that nonlinearly or discontinuously deforms per-token error rates, and partially by possessing too few test data to accurately estimate the performance of smaller models and partially by evaluating too few large-scale models.

"But I think there’s also a direct connection to the user. If emergent abilities are real, then smaller models are utterly incapable of doing specific tasks, meaning the user has no choice but to use the biggest possible model, whereas if emergent abilities aren’t real, then smaller models are totally fine so long as the user is willing to tolerate some errors now and again. If the latter is true, then the end user has significantly more options.

Wir haben diese Nachrichten zusammengefasst, damit Sie sie schnell lesen können. Wenn Sie sich für die Nachrichten interessieren, können Sie den vollständigen Text hier lesen. Weiterlesen:

TheRegister /  🏆 67. in UK

Deutschland Neuesten Nachrichten, Deutschland Schlagzeilen

Similar News:Sie können auch ähnliche Nachrichten wie diese lesen, die wir aus anderen Nachrichtenquellen gesammelt haben.

Google's Bard isn't yet available in the EU and CanadaGoogle's Bard isn't yet available in the EU and CanadaAlso: What deepfake video makers charge; and Anthropic's Claude LLM can now process 100K tokens
Weiterlesen »

Forget DRS, Red Bull’s true F1 brilliance lies elsewhereForget DRS, Red Bull’s true F1 brilliance lies elsewhereRed Bull’s pace with DRS has caught the eye this season, but its F1 rivals suspect its key performance gains comes from other areas 🔍 Here’s what Red Bull’s competition thinks is making the difference ⬇️
Weiterlesen »

Forget DRS, Red Bull’s true F1 brilliance lies elsewhereForget DRS, Red Bull’s true F1 brilliance lies elsewhereJust why has Red Bull been so dominant in F1 this year? Is it its top speed advantage, helped by a powerful DRS? Rivals teams explain one area that has been key to the RB19's sheer pace in 2023.
Weiterlesen »

Rob Burrow's special day packed with emotion which Leeds will never forgetRob Burrow's special day packed with emotion which Leeds will never forget15,000 people took to the streets of Headlingley to take on the first Rob Burrow Marathon with Kevin Sinfield carrying his great friend over the line
Weiterlesen »

Forget Ultrahand, Fuse, and Recall, the best skill in Zelda: Tears of the Kingdom is AscendForget Ultrahand, Fuse, and Recall, the best skill in Zelda: Tears of the Kingdom is AscendThe Ascend skill is the secret joy at the heart of so much that's great with Tears of the Kingdom
Weiterlesen »

Forget OLED MacBooks – I want to see an e-paper laptop from AppleForget OLED MacBooks – I want to see an e-paper laptop from AppleThere's only one thing I really want Apple to do, and I'm probably the only person who wants it.
Weiterlesen »



Render Time: 2025-03-13 03:00:48