NVIDIA GTC 2026: The AI Chip Giant Just Rewrote the Rules of Inference

Thirty thousand people just descended on San Jose for what might be the most important tech keynote of 2026. NVIDIA’s GPU Technology Conference kicks off today, and Jensen Huang has promised to “surprise the world.”

He’s not bluffing.

The Training Era Is Over. Welcome to Inference.

For three years, the AI hardware playbook was brain-dead simple: buy GPUs, train bigger models, repeat. NVIDIA rode that formula to a $4.4 trillion market cap — the most valuable public company on Earth.

But something fundamental shifted. The explosive growth isn’t in training anymore. It’s in inference — the moment a trained model actually does something useful. Generates a response. Makes a decision. Runs an autonomous workflow.

“The number of tokens that are being generated has really, really gone exponential,” Huang said on NVIDIA’s last earnings call, where data center revenue hit $62 billion for the quarter. Up 75% year over year.

That’s not a tweak to the business model. That’s a complete rethinking of what AI hardware needs to be.

NVIDIA + Groq: The End of “One GPU Does Everything”

The biggest bombshell heading into GTC is NVIDIA’s expected partnership with Groq, the inference-focused startup known for blazing-fast Language Processing Units.

The rumored configuration is elegant and radical: instead of forcing one GPU to handle everything, you get specialized silicon for different stages of the inference pipeline. Groq’s LPUs handle decode. NVIDIA’s Vera Rubin handles prefill. NVLink Fusion ties them together.

Huang himself compared the Groq deal to NVIDIA’s 2019 Mellanox acquisition — the networking play that transformed data centers. When Jensen invokes Mellanox, Wall Street pays attention.

If confirmed, this is NVIDIA’s most significant architectural admission in years: the future of AI compute isn’t one chip to rule them all. It’s an orchestra of specialized processors working in concert.

Vera Rubin: Built for Autonomous Agents

Named after the astronomer who proved dark matter exists, Vera Rubin succeeds the wildly successful Blackwell architecture. It’s already shipping to partners, with reported 2.5x performance gains.

But raw power isn’t the headline. The entire platform is engineered for agentic AI — autonomous systems that reason step by step, use tools, and complete complex tasks without hand-holding.

NVIDIA and Thinking Machines Lab announced a multiyear deal to deploy at least one gigawatt of Vera Rubin systems for frontier model training. One gigawatt. That’s a nuclear reactor’s worth of power, dedicated to training AI.

Meanwhile, the standalone Vera CPU — 88 custom ARM cores with simultaneous multithreading — is being deployed in CPU-only racks specifically for orchestrating agentic workflows. Not GPU racks. CPU racks. For AI.

CPUs Are the New Bottleneck (Yes, Really)

Two years ago this would’ve sounded absurd. Now it’s the quiet crisis in AI infrastructure.

“CPUs are becoming the bottleneck in terms of growing out this AI and agentic workflow,” said NVIDIA’s head of AI infrastructure, Dion Harris. The reason: agentic systems spawn multiple agents working as a team, shuttling massive data between workflows. GPUs crunch the models, but something has to orchestrate the chaos.

Meta just struck a massive deal for the first large-scale deployment of Grace CPUs running solo — no GPU pairing. Los Alamos National Lab and the Texas Advanced Computing Center are running thousands of standalone NVIDIA CPUs. Bank of America predicts the data center CPU market will more than double, from $27 billion in 2025 to $60 billion by 2030.

The GPU shortage dominated 2024. The CPU shortage might define 2027.

Feynman: A 2028 Preview That Sounds Like Science Fiction

Expect Jensen to tease the next-next generation: Feynman, slated for 2028.

What’s leaked so far reads like a spec sheet from the future: 3D chip stacking on TSMC’s 1.6nm A16 process (NVIDIA is rumored to be an exclusive early customer), Groq LPU technology stacked directly onto the compute die, and copackaged optics that move data between chips using light instead of electricity.

Jensen has a habit of previewing architectures at GTC before anyone believes they’re possible, then delivering on schedule. Don’t bet against him.

Agentic AI Goes Mainstream

The hardware is half the story. GTC’s session lineup reads like a manifesto for the agentic era.

The pre-keynote features LangChain CEO Harrison Chase, PrimeIntellect CEO Vincent Weisser, and OpenClaw creator Peter Steinberger discussing how autonomous agents reason, use tools, and complete complex tasks. NVIDIA is even hosting a “Build-a-Claw” event where attendees set up their own always-on AI agent.

On Wednesday, Huang himself moderates an open models panel with leaders from LangChain, A16Z, AI2, Cursor, and Thinking Machines Lab. The message is clear: NVIDIA sees open-source AI as critical infrastructure, not a competitive threat.

Why You Should Care

Developers: Specialized inference silicon from NVIDIA-Groq could slash cost per token, making AI agents economically viable for use cases that currently don’t pencil out.

Business leaders: Meta’s 20% layoffs, announced the same week as GTC, aren’t coincidence. They’re investing in AI infrastructure while cutting human headcount. Every major enterprise faces this calculation next.

Investors: NVIDIA’s pivot from “sell GPUs for training” to “sell complete AI inference platforms” is the growth story of the next two years. The CPU market doubling, Groq integration, and Vera Rubin deployments are where the money flows.

Everyone else: The autonomous agents being discussed at GTC aren’t theoretical. Within a year, you’ll interact with agentic AI in customer service, healthcare, financial planning, and your own devices — all running on hardware being announced this week.

The Bottom Line

GTC 2026 isn’t a product launch. It’s NVIDIA publicly pivoting from the company that powered AI training to the company that powers AI doing things in the world.

The Vera Rubin platform. The Groq integration. The CPU renaissance. The Feynman tease. They all converge on a single vision: infrastructure for a world where billions of autonomous agents run around the clock.

Jensen’s keynote streams free at nvidia.com starting at 11 AM PT. This one’s worth your time.