Transcript: Inside Story of Building the World’s Largest AI Inference Chip | Cerebras CEO & Co-Founder Andrew Feldman

Host: In 2015, we had a meeting with Sam Altman and Elias Oscar where and the things they were saying sounded crazy. Oh, we don't have to worry about safety and company that's achieved Moore's law in the last few years.

They raised $1.1 billion dollars. They have now an $8.1 billion post money valuation. the sale of advanced AI chips to the Middle East, striking a deal to export up to 70,000 to the UAE and Saudi Arabia. Here's one.

I'm with cerebrus and this is their new chip, the wafer scale engine three. AI agents are going to take over and you're looking at them saying, this is crazy talk.

And what Elias said was true. what a new workload occurs, a new computer architecture emerges and and new great companies are born. trying to create a chip company in 2015 with existing players. It's not an obvious thing.

What was the thesis there back then? Let's see how it goes. Alright, we're done. You're gonna have to clap for like 30 seconds because I'm going to make you suffer through this whole thing here.

We chose to build the largest chip in the history of the computer industry. This is 56 times larger than the largest chip that had ever been built before.

Tesla tried to build a competitive chip to us and it was called Dojo and they recently shut it down, they they failed.

Microsoft's tried several times, Facebook has tried many times, their internal projects have not succeeded to date. you know, to be 30 or 40% cheaper is is not a good strategy.

I I think you want to be 10 or 15 or 20 times faster, and that's what we are. What do you think about this idea of like compute becoming a commodity? It's an irony because Nvidia's gross margins are 73%. great course is not a commodity. I mean, look.

Today's number is $400 billion.

That's how much money just the four hyperscalers, Microsoft, Amazon, Meta, and Alphabet are spending or expected to spend in this year alone. for context, that's more than 15 times the Manhattan project, two times the amount of money it took for us to do a Apollo program, which literally put the man on the moon.

And that number could globally reach $3 trillion. we are arguably in one of the most biggest capital spending cycles we've ever seen, and it's all centered around one, this one big question. how can we get more and faster compute at scale. my guest today fits rightly at the core of this question, Andrew Feldman.

Andrew is the brilliant co-founder and CEO of Cerebrus Systems. he's a serial entrepreneur with an remarkable track record. previously founded C Micro, a company that changed server architecture and was later successfully acquired by AMD.

Now with Cerebrus Systems, Andrew and team is once again pushing the limits of what's possible in one of the most formidable challenges in modern computing, the insatiable demand for AI compute and inference.

Cerebrus is building a new kind of AI chip from first principles.

They are engineering an entire rate for skilled systems, like their groundbreaking WSC WSC3, the largest AI chip ever built and designed to deliver unprecedented AI compute power, efficiency, and most importantly, inference.

They're at the forefront of AI hardware innovation, challenging established giants like Nvidia, AMD, and redefining performance benchmarks with their unique architecture. their latest accelerator WSC3 boasts an incredible 125 petaflops of AI compute with 4 trillion transistors and a significant leap in inference speed, a massive 1X times their competition. for founders, operators, and investors navigating the world of AI, this conversation will offer invaluable lessons on innovation, a market disruption, building a chip company, and scaling a deep tech company.

So get ready to deep dive. Andrew, welcome to Star Project.

Guest: Thank you so much for having me.

Host: So, while I started to research Cerebrus, I know of Cerebrus because, you know, I obviously follow all things AI. um, but I didn't know where to start.

So I thought the best way to start is, you know, go to the first hand resource of you talking about it in a your event, uh the Supernova 2025. and the one thing that really stuck to me was the demo that you did, uh with, you know, Chat, Cloud, and Mistro chat, which is on your own infrastructure. and that sort of crystallized this idea of, okay, what kind of experience a user can have if you really improve the inference speed. uh and that sort of like really crystallized the idea. uh I want to talk about that, but before getting into that, in 2025, it's obvious that, you know, to work on an AI chip or to work on inference or to do this, you know, an a cloud computing company. there's a lot of neo clouds coming up right now. but you started the company in 2015 and trying to create a chip company in 2015 with the existing players, Nvidia, Intel, AMD, what was the thesis?

I mean, it's not an obvious thing like, you know, to start a chip company. What was the thesis there back then?

Guest: Well, I I think there there there're two parts. one as as a chip and system guy, that's all we know. Right? So it's obvious to us. I mean, we we weren't going to build a web app.

That that's not who we are. and I I think entrepreneurship, like many things in life, uh pays dividends if you stay true to who you are. we we are uh the founding team and and the people we know are infrastructure builders. and those that's what we love and uh that's what we've done our whole careers and for for me and founding team it's it's building chips and systems and the software that runs them such that other people's ideas can run on our machines and their ideas can take flight. and that that's what we love doing.

And so, you know, this is my fifth startup and and all the previous startups were building systems. all the founders, uh all my co-founders were with me at at the last startup I founded. um and so uh it it it was obvious to us that we were going to build a a chip and a system.

I think what wasn't obvious was AI. and you know, in 2015, Nvidia was a $20 billion dollar company, not a $5 trillion dollar company. Right. The world looked very different.

I mean, we had a meeting with Sam Altman and Elias Oscar where and the things they were saying sounded crazy.

Oh, we got to worry about safety and and and and and and AI agents are going to take over and you're looking at at them saying, this is crazy talk. And what what Elias said was true.

It happened. um what we saw was a new type of compute. and that we thought AI uh would usher in a a new uh compute work in the same way that cell phones ushered in a new compute workload in the same way that um switches and routers in the late 90s usherd in a new workload. and when a new workload occurs, a new computer architecture emerges and and new great companies are born. um when there's an existing dominant design as there's been for example in X86, you know, there's been some change in market share between Intel and AMD but there have been no meaningful entrance in two and a half, three decades. right? whereas at this inflection point, at this sort of new with the rise of a new workload, um there is tremendous opportunity.

So when when the cell phone workload emerged, who was better positioned to take advantage of that than AMD and Intel? right? both failed completely.

Zero share, right? and new arm emerged and Apple emerged who's a major player and and Samsung and had never been in the compute business before. in the chip making business. and so when you see a a dislocation. Uh and we that's what we predicted.

Now, we we clearly didn't think it would be this big, uh or we would have raised at a higher valuation. Right? We we had no idea that 10 years later people would be spending $400 billion on CAPEX. So we had no idea.

I mean, in 2016, AI was finding a cat in a picture and making sure it was not a chair. right? That that's where I AI was. But we saw a trajectory that would be big.

We didn't see as that it would be this big, but we saw a trajectory that would be big. and we came to believe that we could build a new type of computer. the type of computer uh beginning with the chip and the processor and and the system that would be really good at this workload. that would be not a little bit better, not a little bit cheaper, but orders of magnitude faster. and that's what we did.

Host: What does uh an iteration cycle look like when you're trying to achieve that? because like if you're building a software product, it's pretty straightforward in terms of how that cycle look. They start with an MVP, you revise it, you improve it, take feedback, but how do you, you know, do that for a hardware product?

Guest: What we uh we chose to build the largest chip in the history of the computer industry. Right? This is one chip. um and On the typical chip is like this size. the size of a postage stamp, the size of your thumbnail, right?

This is 56 times larger than the largest chip that had ever been built before. and so we we set out to do fundamental design creativity, engineering, innovation, and invention.

And uh we spent about three years and when all was said and done, about half a billion dollars uh to to make the first one. and nobody in history had had made one this size. and by being bigger, we could keep more data on chip. we could move it less often and less far.

We could use less power because moving data is really expensive in power and it's expensive in time. And so we could be much, much faster. So the the dividend of doing this was huge.

But it took us years and a great deal of of sort of uh internal fortitude because it didn't it wasn't right the first time or the second time or the eighth time.

Um in fact, we had about a a 15month period where we were spending 8 million a month and we couldn't make one. Right? And so you're going to a board meeting every six or eight weeks saying, nope, still can't make it. Still can't make it.

And then in uh sort of July or August of 2019, we made one. And uh the founders just we stood and we just in a tiny little lab and we watched uh a computer run, which is about as exciting as watching paint dry, right?

It's just a big metal box with some lights flashing. And we looked at each other and we were stunned. I mean, we'd solved the problem that nobody in the computer industry had ever solved before.

And uh we, you know, a few months later, we had our first customer, we uh have been on a tear since then.

Host: Who was your first customer?

Guest: Our first customer uh at the government was Argon National Labs, one of the Department of Energy labs in the US. and our first commercial customer was Glaxo Smith Klein. the the large pharmaceutical.

Host: So what is that plantege of like and I knew with the largest AI chip, um and you like can have four trillion transistors. but what does it mean when you compare it with, you know, the terminal chip? like, can you pack more SRA on top of it? That's exactly right. How much bandwidth do you get?

Guest: That's exactly right. So memory has two types, right? there's slow memory that can hold a lot, that has high capacity.

And that's called DRAM. and HBM is a flavor of DRAM. and that's what GPUs use. and they're called graphics processing units for a reason because they were designed originally for graphics.

And graphics was a problem where you'd move data once, you do a lot of work on it, a lot of work on it, a lot of work on it. and then you'd bring the results back into memory and you'd move a new block. um there's a different type of memory called SRAM. and historically SRAM was extremely fast but was relatively low capacity, couldn't store that much.

And so uh by going to a big chip, we could stuff it to the gills with SRAM. Overcome the limitation of SRAM that it couldn't store very much by using a lot of it, using all the real estate on the big chip so that we could build an SRAM solution.

And the result is we have both capacity and speed. And you know, that's why we're 15, 20, 25, 30 times faster and in some problems thousands of times faster than than V200 GPUs.

Host: So if you have more memory, then you if you're doing let's say pre-training, you don't have to break down more often than you would have to do it on a GPU and like combine the pieces. is that the advantage that we're getting him?

Guest: That's right. let's look at inference because it's a a neater example. in generative inference, to generate a a single word or a token, you have to move all the weights from memory to compute to do a giant matrix multiply, right?

A 70 billion parameter model, which is not a very big model, uh you you're going to move about 100 full length movies worth of data to generate one word. Now, if your memory is off chip, then you've got a thin little pipe to the GPU.

You've got this memory that can store a lot, but is slow with a thin pipe that goes all over here to your GPU. and you've got to move 100 movies worth of data across that little tiny pipe to generate one word.

Host: And that would be across a cluster of GPUs.

Guest: That's that's across a single GPU to get to its HPM. It's even worse if you have to go across a cluster of GPUs.

That pipe is what we call, we measure it in bandwidth, in memory bandwidth. and by putting the SRAM right next to the compute core on the same silicon, right? we move more than 2600 times more data more quickly. and as a result, the inference results come out faster.

And it's just that simple. Um so uh memory bandwidth is a a known Achilles heel of graphics processing unit architectures and uh it was one of the things we saw in our design that we could do vastly better.

Host: Do you see that, you know, right now it's all about compute and pretraining and or maybe I would say that this that sort of like ended last year and everyone is still like saying, you know, we're scaling inference. you know, two, three years down the line, do you think we spend as a ecosystem more on inference than compute?

Guest: You know, inference and training uh are are different. you know, and until about the beginning of 24, maybe the middle of 24, AI was mostly a novelty, right? it it was sort of cool, um but it wasn't doing real work, right?

Um and during that time everybody focused on training because training is how we make AI, but inference is how we use AI. And when when when AI in general was a novelty, nobody was using it.

And what's happened since sort of the summer of 24 is the use of AI has exploded. And that's what people talk about is inference explode.

And, you know, not only have more people been using it, but they use it more often and they use it to do more complicated things. Each of those increases the compute. and so the compute needed is a product of three rapidly growing dimensions.

That's exponential growth. And so that's what's happening and that's why inference has just exploded.

Host: What is it like for a new chip company to compete in this space with AMD and Nvidia?

Guest: It's hard. I mean, I think, you know, look, Jensen and Lisa Sue, who Lisa bought my last company and uh, I think uh they are two of the three great CEOs over the last 10 or 15 years.

If you throw Hawk Tan at at Broadcom. the those three leaders have outperformed just about everybody else in the world. probably in the history of finance. That's right. probably right.

May maybe in in in the history of public markets has done as well as anybody. And they're dazzling. But but their size also creates opportunity for us. They can't move as quickly as we can. Uh, they can't take the type of engineering risks we take.

They can't hire the caliber of people we can hire who who don't want hierarchy and structure and um and so uh there's tremendous opportunity for for the bold entrepreneur.

And you know, you both have to take uh the the giant in the field seriously and you have to know that they can be beaten.

Host: Um in some ways, I mean, you've done now five or more companies, is it better to as a strategy?

I know like what you said really resonated is the fact that, you know, you stick to your lane, what interests you, what curious you, what your strengths are, focus on that, that's always there, but if I backtrack it as a strategy, it's always I think a little bit better to take a very big problem, very high states, very hard problem, then, you know, in something which sounds a little bit more easy.

Is that is that a fair statement to sing?

Guest: You know, I I think Nage it it depends on your passion and it depends on where you are in your career. I think um I I I think chip projects are enormously expensive. and historically they've not been a good place for young first time CEOs.

There are a lot of returns to experience in the chip business. Now, there are other parts of the entrepreneurial ecosystem that have been extraordinarily good to young CEOs, particularly where they and their friends look like their customers, right?

There they have unique expertise that experience can't replicate. right? when they're coding for their friends, when they're building tools that they want to use, right? That entrepreneur has experience and has advantage.

And obviously the entire wave of social networking companies were like that. I think there are many AI companies that are like that where right now some of the most knowledgeable and creative people are just finishing their postdoc, right?

It's not those of us with 20, 25, 30 years of experience that have the most to say there.

Um but in the chip business, where not only do you have to design the logic in a chip, but you have to have relationships with a fab, you have to use EDA tools that cost millions of dollars a year. you have to do backend design and physical design, you have to close timing. there are very few great hardware teams.

You know, maybe eight or 10 in the world and we have one of them. Yeah. And they've been with me for 20 plus years. Yeah. Uh I I think uh I I think that made it easier for us to raise money for a big idea.

Host: I think what do you think about this idea of like compute, you know, becoming a commodity or inference then becomes a commodity? like this whole commodity narrative that sort of flows around.

Guest: It's an irony because Nvidia's gross margins are 73%. everybody's say it's a commodity and they've got gross margins and the highest in any hardware company in history. They've got gross. a lot of like Even storage products.

Storage is still a great business, right? people say it's a commodity.

Storage is a commodity and yet you've got Vast and Weka and all these guys and and Dell continues to lay huge money on their storage. and I mean they bought extreme IO one of the great acquisitions.

Um and uh you know, that's a commodity except if you need it in the enterprise. except if you need Ddupe backup, except if you need right?

You're going to put banking information there, then you need security. and and I I think as you know less about something, the details and the complexity go away, right? as you zoom back.

I mean, look, from the moon, it's just a the earth is just a blue and and and white orb, right? and you get up close and it's like, oh, there are religious battles and political battles and and there's technology battles and there's, right?

And and so I think people who don't really know a domain often throw stones that it oh, that will be commoditized. you look at AI compute is not looking like it's going to be commoditized.

I mean, Andrew Ang said the other day, he said and Andrew Ang is a professor at Stanford and truly one of the the more thoughtful minds in our space. He's not given to wild statements.

Um he said he's never met anybody in AI who feels they have enough compute. Right? I mean, that that's not a market driving to a commodity.

Host: Yeah, and it's also like what you point out about enterprise scale is at that scale, it's not commodity. and if it's a commodity, then why does meta pay $100 million to research engineer? That's right. Of course it's not a commodity.

I mean, what look, the the reason you pay Kylian Mbappe big money is because he has skills other people don't. Right? It it it's that's it. the reason you pay uh Shohei Otani is because he has skills other people don't.

It's not because of anything other than the fact that their skills are in demand and their skills are unique. And I I certainly think there are a collection of of AI engineers.

There are actually a collection of people in each industry who have skills that are so so much ahead of the others that that they're sort of rare and impossible to replicate. right? no number of really good engineers equals an extraordinary engineer.

Yeah. It's not an additive thing. Yeah. And so if they have these the the key Naraj isn't have we overpaid a great guy. Nobody goes bankrupt by overpaying great engineers. You go bankrupt by overpaying mediocre engineers or pretty good engineers.

Yeah. But great engineers are extraordinarily rare and I don't begrudge them one nickel.

Host: Again uh probably a little bit about like you know, your product strategy because you have this chip, now you have to get that chip into the hands of customers. you have, you know, you obviously have data centers that you're building where, you know, which obviously have Cerebrus infrastructure.

So if I'm if I just bring down the product types you have, you have the the physical chip and then you have data centers that uh I'm not sure if you own the data centers or you partnering with others. and then you have your own cloud offering so that people can access uh you know, your infrastructure. and then I think there's a fourth one, which basically, you know, is mostly of a business offering, I would say is you you also enable putting Cerebrus infrastructure in customer on premise environments as well.

Is that like all all the different things or is there anything ever seeing?

Guest: That that was that was really pretty close. I mean, we don't sell the chip. We we sell uh a computer, uh the chip in a system. And sort of that's parallel to say the the NVL 72, right? We sell a a whole solution comes in a rack.

It's delivered, it's got everything you need. We we will deploy that on your premise or uh you you can buy cycles on it from our cloud or from our customers' cloud.

So you can buy it through Amazon marketplace or Microsoft marketplace, you can buy it from uh open uh open router or hugging face or a Versell. there are lots of other places where you can jump on exactly the way you prefer.

You can buy from us, you can buy from our partners. So we have both offerings, if you want a cloud offering or if you want an on-premise offering. and you can bolt on to us via an API.

So the same sort of industry standard open AI like uh API, you just bolt on and you go.

Um, the final thing we we do offer is we we we will offer to large on premise customers some forward deployed engineering where our engineers collaborate and work together with your uh engineers and maybe we're designing models or cleaning data or working on a data pipeline in order to accelerate the delivery of the solution.

So, systems on premise, cycles in the cloud, like our cloud, our partner's cloud. and then we're happy to collaborate with you bring engineering services to bear to to help you get to market faster.

Host: So, uh, you see a lot of companies, you know, out there, I can think of Core, Cruso, uh, and other neo clouds that are building clouds, uh. and scale Nebulous. the list goes on and on. Yeah, so who text? I mean the list goes on and on.

But none of those companies have their own chips, uh, which is I think a unique differentiator that you have, right? All of those are dependent on Nvidia. Yeah, and I think those are basically Nvidia resellers.

Got it. and how do you see that model? because then if you're almost becoming like a uh sort of a partner because you always have data center partners who you who would run data centers for other companies, right? Yeah.

How do you see these neo cloud businesses that are sort of anchored, some are anchored on open AI, some are anchored on Nvidia. um you know, and then I see in public market, uh there's Digital Ocean for example, right?

It's value at two to three billion dollars, while all the neo clouds are valued at, I don't know, 10 to 20 and more. uh so there's a big dislocation there that I see. know, if Digital Ocean is undervalued or these are overvalued, like and in terms of like as a business, it doesn't seem like it's a very it's going to be a high margin business for that near cloud itself.

Guest: It's a low margin business. The question is whether it's a high return on equity in business because they use a lot of debt. There's no doubt it's a low margin business in terms of gross margin.

But uh the return on equity because there's so much debt being used may well be high.

So it it it may well end up looking more like a real estate investment. right? if the return on real estate is 5% per year, but you've only put 20% of the value down, right? The the return on equity, all right, is much higher.

And so, uh debt confuses and makes more complicated the analysis for sure. Um I I think, you know, I I'm certainly not running out and shorting Ex or uh or digital realty trust or any of the traditional players.

I think they run enormous businesses, they have long contracts, they uh have power contracts and they're global. and those are exceptional companies.

I think the upcomers are are are startups in their own right and they're trying to find a niche, just like we're trying to find a niche. They're trying to offer something that was difficult for the incumbents to offer.

They're trying to move more quickly than the incumbents. They're trying to offer power behind the meter, which means right next to generation rather than on the the communal grid. Um, and some of them have chosen like Cruso to build, right?

And so they they are now construction firms. And that's very interesting. And so I I think uh you know, I I I don't I don't begrudge them their dependence or reliance on Nvidia right now.

I I think uh you probably don't as a business proposition want to be dependent on one supplier. I think that's probably not good for for you your ability to flourish.

But I think to get started, I think they they together as a group, they've done amazing things.

Host: I'm curious about your decision of not selling the chips because I mean Nvidia has this premium, you know, price variety and they still have 80, 90% margins. why not uh as a business strategy sell the chips that you have?

Guest: Well, I I think uh I'm a believer in in the system business. um, I think uh it's very hard to get paid for software when you're selling chips.

Um and historically there are very few examples of a successful entry strategy when you sell the chip on a PCI card. I mean, the way you sell chips is on a PCI card, right?

And then you're dependent on Dell or Super Micro for your IO and your power, right? And that you're dependent.

And then the customer I want to do inference then I have to talk to all these vendors, you know, create a my own built up rack, built up infrastructure you got to buy pluggable thing. I mean, nobody's doing it that way. Yeah.

I mean, everybody's buying at these large volumes. Everybody's buying either DGX's or NVL 72s or cerebrus boxes. In fact, think about it this way, Noraj.

AMD was so far behind in building systems that they had to buy ZT. and they paid what six billion for a commodity system maker because they didn't have the expertise.

And in this market building systems and delivering systems is the way consumers wish to consume. and even the way cloud providers wish to consume.

Host: You also talked about like you are available on all the marketplaces, right? Talk to me about a little bit about the marketplace strategy. when customers are going to the marketplace, are they going are they hitting the Azure data center or are they hitting your data center and deploying, you know, you know, I think you have a data center in Oklahoma.

Guest: It depends on it depends on which uh which customer. um when you use compass from G42, which is sort of very popular in the Middle East, uh you're generally hitting their equipment.

Um, when you go to uh uh Microsoft marketplace or uh AWS marketplace, you you are hitting uh uh our equipment in our data centers that their tokens have been directed to us.

So we've made it easy for you to purchase. uh but you're hitting it's more like a software marketplace rather than a hardware first party hardware. Uh makes sense. Exactly right.

The the one of the other challenges I feel is pretty often discussed as well rightly is the whole challenge, right?

One of the reasons why workloads, I mean if I'm starting a new company or new chip company, the daunting task is how do I make this transition from to whatever I am building to access my chip, right?

Like I think that's one of the biggest reasons why Apple still dominates US consumer market in phones is because even though they'll launch a feature that has been out there on Pixel Google phone for like five years back, right?

It's like it's my constant complaint about Apple is like they're launching Pixel like five years back. It's everybody's

Host: I mean, the truth is is uh what Apple is good at is not features, right? It's uh the features in a Samsung phone are extraordinary. um Apple understood easy use and cool like nobody on earth.

And but that's a good segue because when you do inference, you don't use Cuda. And so the rise of inference weakens Cuda.

And I I think that's a a really important observation that developers that want to bring a cool chat application into their app, they don't need to know any Cuda, they just need an API to bolt on to a chatbot.

Um, they uh if you want to do deep research, you don't need to know any Cuda, you just bolt on to a a reasoning model.

And so while Cuda is a a a mote in the training business and we've developed techniques around that, we've developed compilers that take in Pytorch uh in the inference business, uh there's Cuda is irrelevant.

Host: if like someone is already running inference on like Nvidia GPU cluster, uh how easy or, you know, difficult it is to just switch over.

Guest: 10 keystrokes.

Host: That's amazing. All you, I mean, if you're using, say you're using open AI on uh on Azure, your API says something like get APA get_open AI_ or something. You just change it from open AI to Cerebrus. You pick your model, you say I don't know, Qun 235B or OSS 120B. That's it. That's it. It's literally 10 keystrokes.

Host: Yeah, because I think that's one of the factors which defines especially I think was talking recently about this model ofbility and to use anything everywhere and I think that's a really good feature when customers are trying to because that that even if I think of myself when I'm building application or developing anything is these are smaller things that make me not to a new application.

Guest: You know I I think you're exactly right that that the reason people get screwed up or missed uh ease of use is because it's not one thing. It's hundreds of little annoying things, right? Um and that's why inference is so threatening to Cuda is because um it offers no value. It it it is uh not even in the in the loop. you just direct the the user's tokens to your uh your API.

Host: You have one of the, you know, biggest uh strategic partnership uh with G42. um and I don't think a lot is talked about like what G42 because it's been like what G42 actually is. is it like a uh institution, a nonprofit, is it a sovereign institution? what is G42 and how should like I think about it?

Guest: You should think about it as a as a sovereign institution and a national champion for the UAE.

They're the the AI national champion in in the United Arab Emirates. and the leadership in in that nation has decided to make AI a priority. um and they are uh building data centers. they are uh investing in in AI technologies around the world. and they're building large partnerships. and we've been working together uh for several years now. um and it's been extraordinary.

We have built out for them uh some of the largest AI uh data centers in the world located in the US. um we recently received licenses to deliver equipment in the UAE. so we'll be doing that over the course of the next six or eight months.

But this is a huge partnership. together we've trained models in Arabic. We've done genomic research together. uh we serve customers throughout the world together. Um it's a a very powerful thing.

Host: What what of the um one of your engineers, I think it was at a number conference or something, uh was talking about like you discovering inference being faster because you were trying to solve the training problem uh on uh your GPUs. Can you talk a little bit about that like how that happened?

Guest: I I think uh you know, between about 2024, there there wasn't an inference business out there. Right? Nobody was doing inference in production, right? Until Chat, nobody was doing large scale inference at all.

Um, and so every all of us were doing training. And uh what we saw was that uh the architecture had such enormous advantages not just for training but for inference.

And you know, I was probably, you know, too slow in recognizing that and and and shifting or or or adding resources to to build out our inference program. Um, I wish I had done that six or eight months earlier.

Um, but right now that business is on an absolute tear. Um and so uh uh yeah, I I think there's no substitute for working with customers and building things to uh for learning and for product strategy, right?

It's very hard to to do it in a in a conference form.

Host: I think there are few original chip makers out there. I feel like we can if we have to like think about competition, um I think, you know, TPs