Startup ProjectBuild the future
← All transcripts
Podcast Transcript

Transcript: From Zero to $50 Million Read.AI's Explosive Growth | David Shim Founder of Read AI

Full transcript: From Zero to $50 Million Read.AI's Explosive Growth | David Shim Founder of Read AI.

2025-01-13

Host: Hello everyone. Welcome to startup project. I'm your host, Nataraj. My guest today is David Sham. David is a repeat founder, trader, investor, and before this, he was CEO of Forsquare.

He was at one point, the youngest stockbroker in the nation and an angel investor, both as a limited partner in funds and direct investor in startups. Most recently, he founded read.ai.

Their main product is AI meeting summarizer, but they want to be much more and be copilot for everyone everywhere. They recently raised $50 million in series B, led by Smash capital.

In this episode, we'll talk about read AI, what the what is the vision for the company and David's experience with the finding different companies and his opinions on the current AI hype cycle as well and a lot more.

With that, David, welcome to the show.

Guest: No, excited to be here. I appreciate you taking the time to learn a little bit more about Read.

Host: Uh, so where are you joining us today from?

Guest: I'm in Seattle. So we have a office in Seattle where about two-thirds of the company work and a third are fully remote and we're in the office Tuesday, Wednesday, Thursdays, uh, in Seattle only, and then Friday Mondays and Friday's it's optional and I like to come in. It's just where I like this.

Host: You've worked and created companies, you know, before COVID and after COVID, like, and, you know, you mentioned as, you know, most lot fair heap is remote. Uh, do you have a take on is remote work good? Hybrid good? You know, what is your current sense of working or running a company? What is working best for you?

Guest: I'd say hybrid is the future. Hybrid is what works where I would kind of say hybrid doesn't work as well as if you're really early in your career.

It is harder to build those relationships to get that level of mentorship on a fully remote basis.

It's not, it's not say that it's impossible, but generally it is a lot more difficult to go in and say like, hey, I saw you next to the water cooler and we talked a couple of times.

I didn't even know who you were and I've got to know you a little bit better and now I know that you're the VP of this or the director of that, and I can go into you and say, hey, you know who I am, David versus when you're fully remote, especially early in your career, you don't know who to ask.

You only have your manager that hired you and then your team and that's where it kind of ends. So I think that serendipity is missed. Uh this said, what you're more senior?

I I think it becomes easier and easier for you to be fully where know what to do, you know who to talk with. You're not afraid to kind of break down walls. you know what needs to get done.

And I think that's what comes with a little bit more experience generally from a rule of th. So I think future is hybrid. I think read's kind of the same way.

We've got a third fully remote. and then people come in the office Tuesday, Wednesday, Thursdays. Uh and then we let people come in on Mondays and Fridays. They don't have to.

And what's really happening is people actually like that level of interaction. So they're coming in kind of without being required. Uh but they're like, hey, this is great.

I can actually talk to the CEO and they get a little one-on-one time because I want to understand why we're doing.

Host: Yeah, I think I completely agree. I think the the most important thing is giving the option. I think that optionality is what people like.

It's not like they want to completely work from home or they want to completely work from office, but they want the option so that they can work flexibly whenever they want, wherever they want from because I I mean, most statistics say that people are working more when they're working from home, but, you know, I think the companies like Amazon don't want to agree with that.

But I think let's get right into, you know, read AI. Firstly, great name. I think, you know, one single word, I think tells you what the product is a little bit.

Guest: Yep.

Host: And uh, I think a great domain name as well. So talk to me a little bit about, you know, how did the company start? What was the initial idea?

Guest: Yeah, original idea was it started when I was in a meeting. So after I left Forsquare CEO, uh, I had a lot of time on my hands. I didn't have a job lined up. was actually debating what I was going to do.

And so as part of that, it was still kind of peak COVID. It was on kind of a down slope, but it's still peak COVID, so no one was meeting in person.

I'd gone to Mexico like six times in like four months and because that was the only place you could go in from the US.

And I that was getting a little bit boring and I was doing a lot of these calls where I was either giving advice or someone was going in and saying like, hey, would you want to invest in this company?

And what I started to realize was within two or three minutes of a call, you you know if you should be there or not. Like most of us know it's like I should not have been invited to this meeting. Why am I here? But now I turned on my camera.

I cannot leave this meeting. like sneaking out of a conference room, like people will notice. So, usually what most people do when they have that free time is they'll surf the web, right? They'll do they'll answer emails, they'll surf the web.

They'll do click, click, click, click here. they'll look at their phone and see what's going on. Well, what I I did that and I was on ESPN, but then I I noticed a color on the screen that matched the color on my screen.

And what it was, I looked a little bit closer, someone was wearing glasses. And on those glasses, I looked at the reflection. on the reflection, it was the same colors in the image that I could see on ESPN.com that I was.

And so what that did was that trigger like, this is really interesting. Can you use not just audio, but video to understand sentiment and engagement? So, can I determine if someone is engaged or not?

And it wasn't to go in and do Big Brother, are you working hard? Are you paying attention? But it was actually to go in and say, is it a waste of time?

Because anybody in an organization, if you're at Microsoft, if you're an Amazon, well Amazon's a little bit different.

But if you're a Microsoft, you're Google, any big large company, you can actually open up an invite, invite 12 people to the meeting, send it out, and probably 12 out of 12 people will accept if they've got an opening on their calendar without too much context, too much information.

Well, now all of a sudden, you've wasted 12 hours on meeting if all those people didn't need to be on. So, can you use AI to actually understand was this a good use of time? If it wasn't, how can you actually in the future optimize those?

And so that's where the idea started to form where it was saying, hey, can you use this data to optimize productivity? And so I started to dive into the papers and this was like 2021, and AI hadn't come out as a big thing yet.

Like everyone was talking about it, but it wasn't open AI level in 2023. And so I looked at a bunch of the papers and what I found was computer vision wasn't there yet.

What they were used to doing is, hey, we've got someone watching a TV commercial, we're going to track their eyes, we're going to see if they're smiling, but it's a one-to-one ratio. It's not in real time. It's very expensive to do.

Really need a bunch of hardware. No one was actually applying it to video. And so I started to dive in and say, can you actually apply this to video conferencing? Is the technology there?

And is the technology cost effective enough that you could actually run this in scale? So, there are two different problems. Like one is, you can have the technology, but if it's too expensive to process, you're not going to be able to do it. Yeah.

So, the two things that lined up was, yes, it was somewhat expensive, but it wasn't cost prohibitive and the market was getting cheaper and cheaper for cloud services in terms of processing.

And so started to say, okay, possible to do, there are models that are open source that have done this before, but they're not really designed for video conferencing.

So, we started, I started to work with the team of my co-founders and said like, hey, can we actually take the video conferences that we're on, record them and start to build our own models for sentiment and engagement and not using just the words that are said, but how do you react to the words that somebody else said?

So, if you if you've been on a call, let's say you're open AI or you're anthropic, you get a transcript and someone's sales person's giving you a pitch. We're great because we will make you 200% more efficient.

We'll increase revenue by 500% and the cost return will be less than three months. From a seller's perspective and from a large language model perspective, it sounds like a great call on a great product.

But on the flip side, if the client or the prospect is like, they roll their eyes. They start getting more distracted. they're like, really? Is that right? Like they get frustrated with that.

When you pick those things up, the large language models don't pick that up. So you need that reaction layer, that video layer, that responsiveness layer of the to all the people on the call to understand is this a good use of.

So, long way to say it like there's a bunch of data exhaust from meetings and the ability to actually process that without just using the words was something that we said that was like, this is going to be valuable.

And where we started it with was uh, an app on Zoom where we showed you real-time sentiment and engagement.

Host: So, were you analyzing video and text at that point of time or just text?

Guest: Uh, video and text. Because transcription companies existed then. Like back in 2021, 22, a lot of different transcription companies as well as the platforms like Zoom, Microsoft had transcription built in.

So we were less interested in that to be honest front. because what I didn't want to build was something that everybody else already had and then try to make it a little bit better.

I wanted something that was materially going to step function change the way that people look at transcripts, change the way that people look at summarization, uh, the way they look at meetings.

And so what we did is we said, take the transcripts that exist. We're not going to even release a transcription product. But then go in and apply sentiment and engagement to the transcript.

So if you think about it, David said this, but Nataraj responded this way. Well, that narration piece is missing, but the AI that we've built can actually go in and say, this is how the person reacts to the words.

So now if you were going to upload something into a large language model, it not only has the quotes that were said, but how individual people reacted.

So it could be like the CEO was really skeptical based on his facial expressions and he started to get disinterested in 15 minutes into the call and you could very clearly tell that he didn't care about the final end.

Like those things that you can't pick up from quotes, but you can pick up from uh visual cues. The other thing that we started to do was the metadata. So, when you and I talk, we have a constant rate of speed that we generally get in the world.

So, let's say I'm pretty high speed, so let's say I'm like 200 words per minute. But if I go really and I talk really fast, I go 240 words per minute, I start going.

That goes in and says something is changed with David where the topic he's either really more excited about or he's more nervous about. Scripts don't actually have the words per minute. It doesn't have that metadata.

When I upload it into open AI, it doesn't know what to do with that data. And if I have not a model that says like, oh, David was nervous when he was responding to the question because he went from 200 to 245 words per minute.

That's actually valuable. or we've got a model that goes in and says, hey, uh, based on how many interruptions occur after a certain period of time, you can understand is there synchrony in the call.

I think we've all been on calls where you start talking and they interrupt you and you interrupt them and you both apologize and you just keep apologizing because you can't get into that sync.

That actually makes for a pretty bad call at a certain point where it's like, I'm sorry about that. I interrupted you or you're cutting people off.

Most people will get to a point within the first five minutes of the call where you stop interrupting each other because you can understand the cadence in which someone speaks.

Maybe I have longer pauses in between, you're not going to jump in because you can read that. But again, not available in the large language models.

You need to actually go in and understand that from a meta perspective, from a metadata perspective to apply to the.

Host: So, what is the value? I I guess the product also changed or might have changed from that initial version to maybe to series A. Uh, what was the discovery there? Like, what what's the main value that the customers got and from read at that?

Guest: Yeah, so 2022, we launched the product that we did the real-time analytics. Call's going well, what are the scores for sentiment and engagement? It was great. People really believed it.

People went in and said, hey, this is really interesting. I find this valuable. Tell me more about this. Do you have APIs? Can I build on top of? But what was missing from that was the stickiness.

Because people were like, you're telling me this call's going really bad, but what do I do?

Host: Yeah.

Guest: You're not, you're not giving me advice. You're telling me what's happening and I believe you and you're giving me your clear indicators, but I need to solve this problem.

So, we we kind of learned that as we started to interview customers and we kept on interviewing and we made iterations. We started to do slight recommendations like slow down, hey, stop interrupting. But again, that was too much cognitive overload.

So imagine if someone's never driven a car, sorry, someone's driven a car their whole life, but they never had a dashboard. Like accelerator, gyroscope, compass, speed, etcetera, all the stuff that are in there, a gas, fuel.

Well, all of a sudden you put it in front of them, they're going to freak out because they they're trying to focus on the road and now I got to look at this dial, there's 16 different dials to look at.

Host: Same thing with Google Maps when they all of a sudden give a new feature and you don't know what to do with it.

Guest: Exactly, exactly. So, that's where we went in and said, hey, this is valuable from a data perspective. It's very unique, but our application is not necessarily getting the traction that we hope for. Where else can we apply it?

That's when the large language models came out in late 2022 at scale. There they existed before, but at scale 2022.

Uh we did some testing against that. we said, hey, I wonder if that narration layer that we have that's pretty much pretty unique to us, if we applied it to the text of the conversation, does that create a materially different summary?

And the answer was yes. And so we started to go in and said, okay, let's compare it against a prompt without a prompt with our narration layer. And when you started to compare those two things, it was like, oh, this is totally different.

Like you could actually see like, I could tell you this is the most important thing I'm telling you about, but if everyone is not paying attention, that is not the most important thing. So you don't want to put that at the top of the summary.

You want to put what everyone is paying attention at the top of the summary.

So, being able to get that reaction layer really changed the quality of a meeting summary and it found it more to be human in the sense of it highlighted what was most important to you because you could tell by their reactions and that created better summary where we started 2023 as the number 20 meeting note taker in the world.

We just got had a little bit of traction. There weren't that many. There were there was enough, but there weren't that many.

Now we're at like, there's probably 100 plus meeting note takers that have some element of AI. but we're now number two in the world.

So we went from number 20 to number two in a period of less than 18 months and we're within shooting distance of number one.

Host: Who's the number one?

Guest: Uh, number one is a company called Otter. They've been around for about almost a decade, really smart folks. But I think it's really highlighting our growth in 18 months. We've caught up with someone that's been around for 10 years. It just highlights there's a difference in terms of the approach that we're taking and the methodology that we have and the quality of.

Host: And how did you acquire and in these 18 months, like how did you acquire your customers? Was it inbound? Was it outbound? And did you target a certain segment of customers? Like what was your approach in that sense?

Guest: Yeah, so we went in and said, and this is a lot of PCs will say, hey, pick a specific niche, go after that niche and focusing on that. Don't do like wide, go vertical. Yeah.

Host: And Guest: My take and our team's take was, this is such a big market where meeting notes right now are really limited to specific niches like sales use cases. Yeah. Productivity with engineering. I thought this is more mainstream.

This is something where it's like if this is going to be a seminal moment where everyone's going to require an AI assistant over time.

This means all the way from an engineer at Google needs it to a teacher to an auto mechanic in Smalltown USA is going to need it to a grocery store owner in Bangalore might need it.

And when you look at all those things combined, we actually see more than a third of our users are now international.

Uh we're seeing some of the fastest growth that we've ever seen in emerging markets where they're going in and adopting it and in a lot of cases we don't have localized pricing, might change over time.

But you've got countries where it's like Colombia or Ecuador where our price of our software, which is anywhere from 15 to 45 bucks a month, could be up to 5% of the average income in a country on a monthly basis for someone to use, but they're buying.

And so that's where they're like, they're finding use cases where it's like, I'm going to cut 5% of my paycheck for this software solution because it's giving me so much. So I think from a standpoint of market, we went horizontal versus vertical.

That has actually helped out a lot from a PLG motion where we're adding about, what was it? Over 25 to 30,000 depending on the day, net new users every single day without spending a dollar on me.

So we don't spend it's just pure word of mouth, pure product leg growth where if you see the product, people will use the product. They will talk about the product. They will share the product.

And that's been kind of a wheel that's been really accelerating more and more as we went through 2023.

Host: Is it because you're landing if you're not spending anything on marketing, is it because you're landing on someone on a meeting with one person and two other people see it and it's being useful and then they buy it?

Is is because the product inherently has that aspect, right?

I think when if you are taking notes on this meeting and I'm not a read AI user, but I can see your that transcript or summary being generated and that gives me an exposure to the product if it is good or bad.

Is that the reason like how the word of mouth is spreading?

Guest: 100%. And so meetings are natively multiplayer. You can't have a virtual meeting without more than one person, right? So you got two people on a call. everybody on the call is talking. So if someone writes notes, no one's offended.

Oh you're writing notes. This is horrible. Like everyone's on the call, you can write your own notes if you want to. And so that level of kind of um, use case already exists. And then the problem that typically happens like, who's writing the notes?

Who's going to send it out? Who's going to draft it up? Like that was the early 2020 problem with video conferencing. And the problem today now is like, hey, how do I get access to those reports?

And what we've done is we've actually made it really simple for the owner, whoever invited read to go in and share it by typing in an email address, send it to them, push it to Jira or confluence as a page, push it to notion.

And so we don't, we're not trying to be like everyone needs to be on our platform and all the data here. This is where co-pilot go everywhere. and now comes into play is we want to push it wherever you work.

You don't need to necessarily come to read.ai. You can never come to read.ai, but you will see the the data on a confluence page. You'll see it on a notion page. You'll see a J ticket updated with action items.

You'll see a sales force opportunity that has better notes than the seller has ever done. You know like, what's going on? Oh, it's the bottom it says, oh, generated by read and you're like, okay, what is this thing?

So it's that value proposition from a bottom up motion that is really driven a lot of our growth and adoption where people are going in and saying, you are solving a very clear.

And the second layer of growth, I would say that we've seen traction on is the additional products. So if you think about the meeting notes category as a whole, it's large, it's probably got a couple hundred players now at this point.

What you find with the meeting notes is a lot of them are very similar. There's not difference and it's like, hey, some of them don't even look across multiple meetings. Read by default actually processes across multiple meetings.

And what that means is we can give you context, your first meeting on a topic versus your 10th meeting on a topic. Read doesn't treat those as silos, but it aggregates all of those together with a product called for you and read out.

And it gives you a status report. It says, how is this progress? How's this topic progressing? Uh that's very hard to do in a lot of cases because you might get wrong information. But people are like, oh, this is great.

Like, I can actually send to my manager what I've been working on, not just one meeting report, but a whole update on topic. Three months ago we introduced read outs that include emails, that include slack messages, team's messages.

So now all of a sudden, it's copilot everywhere. It's not copilot for your meeting. And when you have co-pilot everywhere, the adoption becomes like, oh, this is freaking great. I don't have to go log into Gmail.

I don't have to go into salesforce for CRM. I don't have to go to Zoom. I don't have to use all these other solutions. It's just right there and then I can decide where I want to push it. Yeah.

I mean, I can almost see when you put this everywhere because I think when I first used uh note takers and one of the note takers was actually on the podcast like a couple of years back, which was Fathom. Yeah. I don't know if you've heard of them.

Um, he's also repeat entrepreneur and they raised, I think series A and the question I really had is if you just generate notes, I think it was not useful uh, for me at least.

And then once like AI sort of came into mainstream, one of my visions that I saw is there will be a product that is in your workplace, knows all the work that you're doing and there should be a single place where you chat with it and it tells you like, these are the tasks that you should you are working on and this is the progress that you're making.

And you can focus on each task and ask you ask questions or get knowledge about those tasks or get meeting notes about this.

So you can almost see an ever present copilot on a work setting especially that will really change productivity for knowledge workers. And we are seeing that sort of different angles of approaches to get there.

And, you know, Microsoft has copilot to do that and you're coming at a different abstraction layer saying that we're coming from a meeting notes perspective, but we'll be everywhere and collect all that knowledge because this knowledge is an important building part of that.

Is that the vision where you're going where, you know, this sort of becomes like a copilot in that sense or um are you thinking something different?

Guest: No, that's exactly what we're thinking from a copilot everywhere perspective. When you think about like the current state of meetings is it's about pushing it to different platforms. We do that today. But you also need to have the pull.

When you pull the data down, you you have a totally different challenge because there aren't LLMs to go and say, here's 16 disparate data sources, combine them together that then you can run an LL. So you have to build these models to identify topics.

Is this topic like, hey, how's your day going versus this day is the lowest revenue that we've had or this is the best day that we've ever had.

Like there's all this different context that you need to be able to solve for that the large language models don't do that are more specific NLP models, you're doing topics, you're doing clustering.

And when you're able to do that correctly, it becomes magic and all of a sudden it becomes this productivity tool where it's like, hey, I don't need to think about XYZ.

It really becomes like almost a TikTok feed where, here are the topics that are relevant to you, David, because this is what you've been working on. This is what you've been interested. Let's say I'm a CRO. I really want to know how revenue is going.

Well, now I get a feed of all the deals that are about to close. I get updates on is this a good likely to close? This is not likely to close. And that becomes high value. I think from a perspective, it's huge.

Host: You mentioned the vertical versus horizontal approach.

Are you still like thinking red first and like, you know, thinking about features that you should launch that will help across the breath or are you now targeting what does a fortune 500 company want? You know, what does a SMB want?

Or what does a sales company want? How are you thinking about that horizontal and vertical approach now?

Guest: Yeah, so we're still horizontal, but we're picking specific verticals based on customer demand.

So sales, engineering, product management, recruiting, those are very clear kind of indications from our teams where they've heard they want X, Y and Z. That's why we did the notion integration, which is one of the fastest growing lead ever launched.

That's why we did Jira and confluence. That's why we're doing more and more from there like slack.

And someone said like, hey, I'd love slack messages because slack is actually almost like Snapchat for messaging uh, for corporate messaging because it doesn't disappear, but I get so many slacks. if I click on something and I forget about it, I come back, there's it's buried under another like three different scrolls of actually finding the content that was relevant to me.

So I need something that goes in and says, David was interested in this topic in the meetings that he was in and the emails that he had.

Now go find that in the slack messages and the slack channels that I can't read. and if you deliver that to me, that becomes infinite money. So that's worth that copilot everywhere.

And what you're hearing about copilot everywhere is another way to look at is its agents. Everyone's talking about AI agents right now. I'd say that's the hot thing in the AI space is everyone's doing this agent agents get with each other.

Well, in reality, that's nice. But what you want is you want your Jira to talk with your notion, to talk with your Microsoft, to talk with your Google and talk with your Zoom. Yeah.

You can call whatever you want, but those are the integrations where they can talk with one another where it's push and pull of data.

And I think that is going to be the next big space where you'll see agents coming to play, but people won't even know it's an agent. It's a click button that I did single sign on and then I added that.

Host: In in terms of models, what are the models that you're, you know, primarily foundation models that you're using and talk to me a little bit of like pricing because some of the like very, very early stage products that I see are completely priced based on how the foundation model is pricing.

So, how are you thinking about like pricing, you know, and are you optimizing it for price or are you still, you know, sort of held in control of foundation models?

Guest: Uh, we are not held in control of foundation. Uh, to give an idea, last month, 90% of our processing was our own proprietary. So, what we use the large language models is for the last month.

It's going in and saying, hey, we found 16 really interesting topics. Here are the response course. Here are the reactions. Can you put this into a readable sentence? Can you put this into a readable paragraph?

But we're actually using the reactions of how people interact.

We're looking at the metadata like, hey, when you reply to an email, your average time is 30 minutes, but when you reply to someone in two minutes and you've done it six different times across a week, that's all of a sudden I priority very important person or very important topic.

So we're building these models that secure these groups together that then we're going in and identifying the subject matter that's being discussed.

Then we're going to the LLMs and say like, just summarize this for us, but we've identified all the key pieces. So for us, 90% of our processing cost is our own internal models. I think we're at like five issued patents now and we've got more pending.

So I think this is where we're different from a lot of companies. We actually have in place versus we're building a wrapper layer where it's like, we got really good prompts.

But like that only gets you till 4.5 comes out or five comes out and they got to rewrite all your prompts. It's like, that's not a defensible mode. That's a temporary.

Host: So, in a way you're um, and I'm assuming these models are mostly NLP models and not LLMs and are like focused on, you know, getting a better uh, sort of view from the transcripts that you're generating.

Guest: That's on the words. But on the computer vision side of things to react like you're nodding your head like this. We've got models to detect that.

We've got models where if you're looking like this and I'm talking to you but I'm going to a fixed point, what that means is my camera is potentially there and then I go back to the main big monitor that I have up here.

So, when you move your head, if you go to fixed points, there's a separate model for that. So, when you start to stack all of these models up, it starts to give you, hey, accuracy goes from 0.75 to 0.8, 0.9 to one.

Uh those are really the models that you got to stack together. It's not just what we've learned is one model will not solve all the issues that are required to really deliver relevant.

Host: So you're almost using the best of uh sort of pre LLM era and post LLM era and you're taking the models that you want for your specific feature that you're building and using LLMs only when required.

Guest: Yes. LLMs are great, but it so we don't have to worry about that. That's going in and saying, you've got like four or five companies of massive scale that have billions of dollars that are going to continue to improve that piece.

Go for it all day long. But when you're talking about computer vision, a lot of it's more like, hey, what are the words in front of me? Like what is the object in front of me? We're going in and saying like, take it a step further.

Is that object happy or sad? In the sense of like, are they nodding their head? Are they shaking their head? Are they actually engaged where they're looking straight ahead or are their eyes going kind of like this and looking around.

Those are all things that you pick up and then it's also going in and saying, what's the interaction between the two objects?

So if you have two people, if you have two emails, and if you have a slack message that goes back and forth, what are the response times? How deep are you going? How unique is the content that you're generating?

All of those things are really metadata points and I I think the best example is a Fdo was used TikTok or reels.

Uh within about five minutes with a fresh account, it can figure out with a very high level of accuracy what content you're interested in and it starts to build on top of that. And you don't even need to like an individual story.

It pulls in the weak signal that says, hey, you normally watch the story for