
Nvidia's $20B Antitrust Loophole
"You're taking on a giant. What gives you the audacity?" On November 5th, 2025, Groq CEO Jonathan Ross was asked why he was even bothering to challenge Nvidia. He didn't blink: "I think that was a polite way to ask why in the world are we competing with Nvidia, so we're not. Competition is a waste of money; competition fundamentally means you are taking something someone else is doing and trying to copy it. You're wasting R&D dollars trying to do the exact same thing they've done instead of using them to differentiate." 49 days later, Nvidia paid $20 billion for Groq's assets and hired Ross along with his entire executive team. Except this wasn't actually an acquisition, at least not in the traditional sense. Nvidia paid $20 billion for Groq's IP and people, but explicitly did NOT buy the company. Jensen Huang's statement was surgical: "While we are adding talented employees to our ranks and licensing Groq's IP, we are not acquiring Groq as a company." That phrasing is the entire story. Because what Nvidia carved out of the deal tells you everything about why this happened. Forget the AI doomer takes about a bubble forming, lets look into the actual reasons. What Nvidia Actually Bought (And What It Didn't) Nvidia acquired: All of Groq's intellectual property and patents Non-exclusive licensing rights to Groq's inference technology Jonathan Ross (CEO), Sunny Madra (President), and the entire senior leadership team Nvidia explicitly did NOT buy: GroqCloud (the cloud infrastructure business) GroqCloud continues as an independent company under CFO Simon Edwards. This is Nvidia's largest acquisition ever (previous record was Mellanox at $7B in 2019), and they structured it to leave the actual operating business behind. That doesn't happen by accident. LPU vs TPU vs CPU: Why SRAM Matters To understand why Nvidia paid anything for Groq, you need to understand the architectural bet Ross made when he left Google. CPUs and GPUs are built around external DRAM/HBM (High Bandwidth Memory). Every compute operation requires shuttling data between the processor and off-chip memory. This works fine for general-purpose computing, but for inference workloads, that constant round-trip creates latency and energy overhead. GPUs evolved from graphics rendering, so they're optimized for parallel training workloads, not sequential inference. TPUs (Google's Tensor Processing Units) reduce some of this overhead with larger on-chip buffers and a systolic array architecture, but they still rely on HBM for model weights and activations. They're deterministic in execution but non-deterministic in memory access patterns. LPUs (Groq's Language Processing Units) take a different approach: massive on-chip SRAM instead of external DRAM/HBM. The entire model (for models that fit) lives in SRAM with 80 TB/s of bandwidth and 230 MB capacity per chip. No off-chip memory bottleneck. No dynamic scheduling. The architecture is entirely deterministic from compilation to execution. You know exactly what happens at each cycle on each chip at each moment. This creates massive advantages for inference: Llama 2 7B: 750 tokens/sec (2048 token context) Llama 2 70B: 300 tokens/sec...
Preview: ~500 words
Continue reading at Hacker News
Read Full Article