Micron just lost over $100 per share. Samsung shed 5%. DDR5 prices dropped 30%.

The culprit? A compression algorithm from Google Research that makes AI models need dramatically less memory. And it hasn’t even shipped as a product yet.

The DeepSeek Sequel Nobody Expected

TurboQuant does something deceptively simple: it compresses the key-value cache — the short-term memory AI models use during inference — by 6x while making inference 8x faster. Zero accuracy loss. No retraining. No fine-tuning. Just plug it into your existing pipeline.

That last part is what separates this from DeepSeek’s efficiency moment earlier this year. DeepSeek required architectural decisions baked in from day one. You couldn’t retrofit it. TurboQuant is a drop-in upgrade. That’s the difference between a paper you bookmark and a tool you deploy next Tuesday.

Google will present the full paper at ICLR 2026 in Rio de Janeiro later this month. Independent developers who grabbed the early code are already confirming the claims hold up.

How It Actually Works

Two steps, both clever:

PolarQuant randomly rotates data vectors to simplify their mathematical geometry. This lets a standard compressor capture the core meaning of each piece while using most of the compression budget efficiently.

QJL (Quantized Johnson-Lindenstrauss) uses a single bit — one yes/no signal — to correct residual errors from step one. It’s a mathematical error-checker that eliminates bias at essentially zero memory cost.

The result: massive compression that doesn’t sacrifice information. Traditional compression methods often introduce overhead that eats into the gains — like buying a filing cabinet that’s bigger than the papers it holds. TurboQuant sidesteps that entirely.

The Carnage on Wall Street

The market reacted fast and hard:

  • Micron: $467 → $366 per share in two weeks. Over 20% gone.
  • Samsung: Nearly 5% decline within days.
  • SK Hynix: 7.3% shed over two trading sessions.
  • Western Digital: Down 4.7%.
  • DDR5 memory prices: 15-30% drop, the first meaningful decline in months.

The logic writes itself: if AI needs 6x less memory, the explosive demand projections for HBM4 and high-bandwidth memory need a serious rewrite. Companies planning massive fab expansions are reportedly pausing to assess.

But context matters — broader markets were already sputtering. TurboQuant poured gasoline on a fire that was already smoldering.

Why Memory Stocks Might Be Overreacting

Here’s the counterargument, and it’s historically airtight: making something cheaper doesn’t reduce total spending. It increases it.

The Jevons paradox — named after a 19th-century economist who noticed efficient coal engines led to more coal consumption, not less — applies directly here.

“Hyperscalers won’t cut their spending — they’ll just spend the same amount and get more bang for their buck,” said Jim Handy, president of Objective Analysis. “Data centers aren’t looking to reach a certain performance level and stop. They’re looking to out-spend each other.”

If TurboQuant means you can run bigger models on the same hardware, companies will run bigger models. They won’t return the GPUs.

Alex Cordovil from Dell’Oro Group added a reality check: “This is a research breakthrough, not a shipping product. There’s often a meaningful gap between a published paper and real-world inference workloads.”

What This Actually Means For Everyone Else

Beyond the stock drama, TurboQuant has four implications worth watching:

Cheaper API pricing. Lower inference costs should eventually hit API prices. More startups can afford frontier models. More products get built.

Local AI gets real. Compress a large model enough and it fits on consumer hardware. Your laptop running a genuinely powerful model isn’t science fiction anymore — it’s an engineering problem that just got a lot more solvable.

Longer context windows. The KV cache grows with context length. Compress it 6x and models can handle dramatically longer documents, conversations, and codebases. For agentic AI workflows that need sustained context across complex tasks, this is transformational.

Lower barriers to entry. Smaller labs that couldn’t afford massive GPU clusters might suddenly find themselves competitive. The AI oligopoly loosens, at least at the inference layer.

The Bigger Picture

April 2026’s AI landscape is absurdly competitive. Gemini 3.1 Pro leads benchmarks. Claude Sonnet 4.6 dominates real-world work. GPT-5.4 just shipped with 5.5 reportedly done with pretraining. Anthropic’s leaked Claude Mythos has a 25% chance of launching this month.

TurboQuant doesn’t compete with any of them. It makes all of them cheaper and faster to run. Infrastructure-level innovation that lifts every boat — arguably more consequential than any single model release.

The ICLR presentation will bring intense scrutiny. The real test is production deployment at scale, where benchmark performance and real-world reliability diverge in ways researchers don’t always anticipate.

But the trajectory is clear. AI efficiency just took a massive leap. Whether that translates to lower costs or just bigger ambitions depends on whether you believe the Jevons paradox — and two centuries of economic history suggest you should.

The memory chip selloff might be premature. The excitement about what TurboQuant enables is not.