How Vast Data Built a $30B AI Data Platform | Jeff Denworth, Co-founder of Vast Data

5 Things You'll Learn from This Episode

Why Vast Data bet on all-flash architecture in 2016 when the storage market was widely declared dead by investors — and how timing that bet to the deep learning wave created a structural advantage.
How Vast's distributed architecture — separating stateless compute nodes from a globally shared SSD pool — eliminates East-West traffic and enables constant-time vector search regardless of data scale.
Why Vast now powers roughly 90% of Neo cloud GPU deployments, and how an early focus on enterprise features and multi-tenancy won the AI infrastructure market over raw-performance competitors.
How Vast expanded from file, object, and block storage into vector databases, SQL analytics, streaming, and serverless compute — competing with Snowflake, Databricks, and Kafka from a single codebase.
How Vast's similarity-based global compression reduces flash costs by 50–70% on average, and why that same technique now applies directly to key-value cache storage for long-context AI inference.
Why Vast's gross margins above 90%, cashflow positivity, and a Rule of 40 score of 228 make it one of the rare high-growth infrastructure companies that is also fully profitable.

About the Episode

Jeff Denworth, Co-founder of Vast Data, joins Nataraj to recount how a "super unsexy" storage startup launched in 2016 grew into a $30 billion AI data platform powering roughly 90% of Neo cloud GPU deployments worldwide. From the original conviction that deep learning would redefine computing's relationship with data, to the surprise expansion into vector databases, SQL analytics, streaming, and serverless compute, Jeff walks through the architectural decisions that set Vast apart — and why the company's Rule of 40 score of 228 is the best number any investor they've spoken to has ever seen. The conversation also covers the flash supply chain crunch expected to last until 2028, how Vast is helping customers squeeze 2–3x more utilization from existing hardware, and why long-context storage for AI inference may be Vast's next big frontier.

Timestamps

0:00 — Introduction: Vast Data and the AI infrastructure opportunity
1:21 — Origin story: Why start a storage company in 2016?
7:03 — First customers: life sciences and quantitative trading
8:52 — Early competition and the on-premises bet
11:31 — Neo cloud dominance: CoreWeave, Lambda, Crusoe
14:17 — Hyperscaler partnerships: AWS, Google, Microsoft
15:25 — Competitive differentiation vs. DDN and legacy players
20:34 — Why VAST built its own vector database
23:22 — Business model: Neo clouds and revenue sharing
26:48 — The AI data opportunity and agent-scale infrastructure
31:16 — Architecture and long-context inference (CMX, key-value cache)
35:43 — Flash supply chain crunch until 2028
39:50 — Profitability: Rule of 40 score of 228

Key Insights

Q: What makes VAST Data's storage architecture fundamentally different?

Traditional distributed storage systems — built on Google File System-era designs — rely on nodes that each own a portion of the data and must communicate with one another to coordinate transactions. As clusters grow, this East-West traffic causes performance to degrade. VAST separates stateless compute nodes from a globally shared pool of SSDs over NVMe-oF fabrics, so all cores can see the entire data set in parallel with no internal coordination needed. This allows performance to scale linearly and enables constant-time access regardless of cluster size.

Q: What is the Rule of 40 and why does VAST's score of 228 matter?

The Rule of 40 is a benchmark used by investors: add a company's year-over-year growth rate to its free cash flow margin — if the result is 40 or above, it's considered a healthy business. Most fast-growing software companies land between 40 and 80. VAST's score of 228 — combining 2–3x annual growth with strongly positive free cash flow — is described by Jeff as the best number any investor they've spoken to has ever seen, reflecting the rare combination of hypergrowth and full profitability.

Q: How does VAST's similarity-based compression reduce flash costs by 50–70%?

VAST uses the same mathematical approach that powers vector search — fuzzy distance calculation between data blocks — to identify when two blocks look similar to each other. When it finds a match, it compresses them against each other at the block level, globally across the entire cluster. This works across all data types without needing to understand file format. The same technique also applies to key-value cache data used in AI inference, directly reducing the cost of storing long-context windows for LLM applications.

Q: How does VAST's vector database achieve constant-time search without memory-based indices?

Most vector databases store their search indices in DRAM, which is expensive and limits scale. VAST uses flash as high-performance persistent memory over NVMe-oF fabrics, allowing the system to search across billions to trillions of vectors in under a second without memory-based indices. Because all compute nodes share the same global SSD pool and generate no East-West traffic, writes to the vector database are parallelized across all nodes simultaneously — making the system fast for both ingestion and retrieval at any scale.

About Jeff Denworth

Jeff Denworth is Co-founder of Vast Data, the AI data platform valued at $30 billion that powers roughly 90% of Neo cloud GPU deployments worldwide — including CoreWeave, Lambda, Crusoe, and Nscale. He co-founded Vast in 2016 with the conviction that deep learning would fundamentally reshape the relationship between computing and data, and that a new storage architecture was needed to meet that moment. Prior to Vast Data, Jeff worked at DDN, a high-performance computing storage company.

Co-founder at Vast Data

About the Host

Nataraj Sindam is the creator of The Startup Project, a podcast featuring founders, investors, and operators building the future.

Twitter Newsletter Website

#StartupProject #VastData #JeffDenworth #AIInfrastructure #CloudStorage #NeoCloud #VectorDatabase #FlashStorage #DistributedSystems #Entrepreneurship #Podcast #Tech