Beyond LLMs: in search of the next AI beast

May 20, 2026 (3w ago)

Originally published on Zoopa.

Large language models excel at text-based tasks but struggle with reasoning. They compose essays instantly yet miscalculate tips. They draft perfect emails but lose track of logical sequences. This limitation appears architectural rather than merely a parameter-scaling issue.

Recent months reveal industry movement toward discovering successor technologies. Several prominent researchers have launched ventures exploring different approaches, and their shared conclusion suggests autoregressive LLMs may be an architectural dead-end.

Three frontier bets

  • David Silver, AlphaGo's creator, departed Google DeepMind to establish Ineffable Intelligence, valued at $5.1 billion.
  • Subquadratic unveiled SubQ, processing 12 million context tokens at substantially reduced cost.
  • Yann LeCun, Meta's longtime AI chief, left to raise $1.03 billion for AMI Labs.

Three distinct strategies emerge: Silver questions learning mechanisms, SubQ addresses processing efficiency, LeCun challenges prediction paradigms. None assumes language is thought's fundamental substrate.

The diagnosis

Three structural problems recur across the critiques:

  • Human-generated data constitutes a finite resource.
  • Transformers squander computational capacity.
  • Autoregressive architectures compound errors exponentially.

The approaches

Silver: reinforcement learning in simulation. Human data functions like fossil fuel, finite and bounded by human intelligence itself. He proposes reinforcement learning in simulated environments where agents discover solutions through trial and error, mirroring AlphaGo's breakthrough.

SubQ: sparse attention. Transformer attention operates at O(n²) complexity, comparing every token against every other, creating massive waste. Subquadratic Sparse Attention identifies relevant relationships dynamically, achieving 52x speed improvements at one million tokens.

LeCun: world models with JEPA. Autoregressive prediction accumulates exponential error. JEPA (Joint Embedding Predictive Architecture) predicts abstract world representations rather than generating tokens or pixels, enabling causal reasoning and planning.

Why it matters

Solving distinct problems requires diverse solutions rather than a single monolithic approach. There is an 80-year arc of computing simulation behind these bets, and the lesson for brands is parallel: the accumulation of vertical, brand-behavior data inside LLMs (the kind GEORadar builds) is a proprietary asset that stays valuable across paradigm shifts, whatever architecture wins next.