last month i ran an experiment. four frontier LLMs – GPT-5.2, Claude Opus 4.5, Grok 4, and Gemini 3 pro — received strategic decision making tasks under genuinelast month i ran an experiment. four frontier LLMs – GPT-5.2, Claude Opus 4.5, Grok 4, and Gemini 3 pro — received strategic decision making tasks under genuine

Large language models still can’t handle real uncertainty

2026/03/29 03:01
Okuma süresi: 5 dk
Bu içerikle ilgili geri bildirim veya endişeleriniz için lütfen crypto.news@mexc.com üzerinden bizimle iletişime geçin.

last month i ran an experiment. four frontier LLMs – GPT-5.2, Claude Opus 4.5, Grok 4, and Gemini 3 pro — received strategic decision making tasks under genuine uncertainty. Not the type of uncertainty where you don’t know the answer yet, but will look it up when given the opportunity; the type of uncertainty where there literally is no answer until your opponent decides what to do.

Results were fascinating and somewhat embarrassing for models worth billions in compute.

Large language models still can’t handle real uncertainty

The study

Tasks involved variations of classic imperfect information games. Those types of scenarios include situations where each participant holds private information invisible to others and must make sequential decisions modeling what their opponents may do. Think competitive bidding, negotiation or any situation where you’re committing irreversibly without full knowledge of all the facts.

All models performed well in the structured opening phase of each task. That’s because decisions made in that part of the process mapped to defined ranges and used mathematical guidelines. The models selected reasonable actions and built sensible strategies. Adjustments based on observable signals were consistent across each step. This part was almost human-like.

However, the moment the scenario became dynamic, requiring multi-step planning over uncertain future states — everything fell apart. The models couldn’t maintain a coherent Plan between sequential decisions. They treated each step independently rather than as part of an integrated strategy.

Where the reasoning breaks down

Sequential processing of information is how LLMs function. Therefore, they don’t update a probability distribution over hidden variables as new observations are obtained. They don’t “think” “if i commit here, how will my opponent respond, and how does that impact my options three steps from now?”

Nate silver recently described this perfectly in an essay. These models Reason like someone who read extensively about strategy but never actually had to execute under pressure. They understand concepts isolately. However, they cannot integrate those concepts into a multi-step Plan where each decision constrains future decisions.

Deepmind’s google kaggle game Arena confirmed this at scale in early 2026. Ten leading LLMs competed across multiple imperfect information benchmarks. Although the winner outperformed all other models competing, its performance would not have survived against a moderately experienced human strategist.

Specialized systems tell a different story

Where things get interesting is with purpose-built AI systems. While general-purpose LLMs struggle with imperfect information tasks, purpose-built AI systems have been super-human at such tasks since 2017.

Carnegie mellon’s Libratus achieved this using Counterfactual Regret Minimization — a technique specifically designed for environments containing hidden information.

These systems don’t “understand” strategy similar to how a language model attempts to. They don’t analyze case studies in natural language or discuss tactics. Instead, they play billions of scenarios against themselves and minimize regret — literally calculating how much better they could have done if they chose each alternative action and then adjust accordingly.

The gap between an llm handling uncertainty and a Specialized system is roughly equivalent to the gap between a philosophy professor explaining how to ride a bicycle and an Olympic cyclist riding one. Both understand the concept. Only one can execute.

The SpinGPT exception

One interesting outlier exists. Researchers published SpinGPT in late 2025 — the first LLM fine-tuned specifically for imperfect information decision-making. Instead of utilizing a general-purpose model and hopping it figures out strategy, the researchers trained a language model on solver outputs and actual game data.

SpinGPT matched expert-level recommendations 78 percent of the time and achieved a positive performance rate vs established benchmarks. Not superhuman — but solidly competent — better than most casual practitioners.

That indicates the architecture isn’t the problem. LLMs can learn to handle uncertainty when trained with the right data and objective. A general-purpose chatbot which learns strategy from internet discussions will perform like someone who learned strategy from internet discussions.

What this means for AI builders in 2026

I believe imperfect information benchmarks represent the best test we currently have for evaluating AI reasoning. They force a system to:

Reason under genuine uncertainty where you cannot know the correct answer Plan across multiple sequential decisions with irreversible consequences Model an adversary whose goal is to deceive you Balance information gathering against exploitation make decisions where the optimal strategy depends on hidden variables.

The fact that frontier LLMs still struggle with these tasks — while Specialized systems resolved the two-player version eight years ago — tells us that general reasoning and domain-specific expertise are fundamentally different things.

My bet is that hybrid systems will be seen first. Something similar to Spingpt’s approach where an llm-type architecture handles high-level strategic reasoning while a dedicated module tracks belief states and calculates expected outcomes in real-time. Not pure language model. Not pure solver. Something in between.

Currently, if you’re building AI agents which need to handle genuine uncertainty — not just missing data, but also adversarial hidden information — don’t begin with an llm. Begin with the literature on game theory. CFR and its derivatives are your foundation. Layer language understanding on top if needed.

Models will improve. However, the gap between “can discursively discuss strategy eloquently” and “can execute strategy under pressure” remains tremendously large. Closing this gap will require more than scaling transformers.

Comments
Piyasa Fırsatı
4 Logosu
4 Fiyatı(4)
$0.015917
$0.015917$0.015917
+20.61%
USD
4 (4) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen crypto.news@mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Is Ethereum a Good Store of Value?

Is Ethereum a Good Store of Value?

The post Is Ethereum a Good Store of Value? appeared on BitcoinEthereumNews.com. The question of whether Ethereum (ETH) can really act as a store of value is coming
Paylaş
BitcoinEthereumNews2026/03/29 21:08
XRP Ledger Drops Below Key 1 Million Threshold, but It Is Perfect Opportunity

XRP Ledger Drops Below Key 1 Million Threshold, but It Is Perfect Opportunity

The post XRP Ledger Drops Below Key 1 Million Threshold, but It Is Perfect Opportunity appeared on BitcoinEthereumNews.com. Ledger’s big milestone Not many sellers
Paylaş
BitcoinEthereumNews2026/03/29 21:37
Galaxy Digital Continues Bitcoin Sell-Off: Offloads 800+ BTC To Major Exchanges

Galaxy Digital Continues Bitcoin Sell-Off: Offloads 800+ BTC To Major Exchanges

The post Galaxy Digital Continues Bitcoin Sell-Off: Offloads 800+ BTC To Major Exchanges appeared on BitcoinEthereumNews.com. Galaxy Digital Continues Bitcoin Sell-Off: Offloads 800+ BTC To Major Exchanges | Bitcoinist.com Sign Up for Our Newsletter! For updates and exclusive offers enter your email. Sebastian’s journey into the world of crypto began four years ago, driven by a fascination with the potential of blockchain technology to revolutionize financial systems. His initial exploration focused on understanding the intricacies of various crypto projects, particularly those focused on building innovative financial solutions. Through countless hours of research and learning, Sebastian developed a deep understanding of the underlying technologies, market dynamics, and potential applications of cryptocurrencies. As his knowledge grew, Sebastian felt compelled to share his insights with others. He began actively contributing to online discussions on platforms like X and LinkedIn, focusing on fintech and crypto-related content. His goal was to expose valuable trends and insights to a wider audience, fostering a deeper understanding of the rapidly evolving crypto landscape. Sebastian’s contributions quickly gained recognition, and he became a trusted voice in the online crypto community. To further enhance his expertise, Sebastian pursued a UC Berkeley Fintech: Frameworks, Applications, and Strategies certification. This rigorous program equipped him with valuable skills and knowledge regarding Financial Technology, bridging the gap between traditional finance (TradFi) and decentralized finance (DeFi). The certification deepened his understanding of the broader financial landscape and its intersection with blockchain technology. Sebastian’s passion for finance and writing is evident in his work. He enjoys delving into financial research, analyzing market trends, and exploring the latest developments in the crypto space. In his spare time, Sebastian can often be found immersed in charts, studying 10-K forms, or engaging in thought-provoking discussions about the future of finance. Sebastian’s journey…
Paylaş
BitcoinEthereumNews2025/09/19 00:18