How Chronosphere’s Founder Solved Uber’s Observability Crisis

The Challenge of Modern Observability

In the rapidly evolving world of cloud-native technology, observability has become a cornerstone for maintaining reliable and performant systems. Yet, as companies shifted to containerized environments like Kubernetes, traditional monitoring tools struggled to keep up with the scale and complexity. Martin Mao, co-founder and CEO of Chronosphere, experienced this problem firsthand while leading the observability team at Uber. He witnessed the explosion of data and costs associated with monitoring microservices at a massive scale. This challenge became the crucible for a new idea. Martin joins us to share the story of how he and his co-founder turned their internal solution at Uber into Chronosphere, a leading observability platform. He delves into the nuances of building for a containerized world, the strategy behind competing with cloud giants, and the future of observability in the age of AI.

→ Enjoy this conversation with Martin Mao, on Spotify, Apple, or YouTube.

→ Subscribe to ournewsletter and never miss an update.

The Genesis of Chronosphere at Uber

Nataraj: How did Chronosphere start? When did you decide you had to stop working at other companies and start your own?

Martin Mao: The story goes back to when my co-founder and I worked at Uber, where we led the observability team. We faced many of the challenges internally at Uber that we’re now solving for our customers at Chronosphere. We ended up creating a bunch of new technologies in that solution and open-sourcing many of them. That showed us that the observability problems we were solving for Uber were also being seen by the rest of the market as they started to containerize their environments. Ultimately, that led us to decide we should create a company to bring the benefits of this technology to the broader market.

Nataraj: What was the specific problem you faced at Uber that wasn’t being solved by available tools at the time?

Martin Mao: If you think about observability, it’s about gaining visibility and insights into your infrastructure, applications, network, and business. The concept isn’t new; we’ve had observability software, previously called APM or infrastructure monitoring software, for a long time. What happens when you start to containerize and modernize your environments is twofold. First, you’re breaking up larger monolithic applications into smaller microservices. You have more tiny pieces running on containers, which are running on VMs. There are just more things to monitor, which generally produces a lot more observability data. The first problem you’ll find is either there’s too much data for your backend, or it costs you too much.

Second, the types of problems you’re trying to solve on monolithic apps running on a VM are different from the causes of problems in a distributed, containerized environment. A lot of APM software focused on how software interacted with hardware and the operating system. In a containerized world, you often don’t have access to that level, and a cause of your issue is more likely a downstream dependency, a deployment, or a feature flag change. The causes of problems have changed, so you need a tool optimized for these new types of issues. Those were the two big problems we saw at Uber: too much data, too much cost, and it wasn’t the ideal tool for these new environments. When we looked at the market at the time, there was nothing we could buy, so we were forced to build our own solutions.

Nataraj: What services were available at that point? There’s a lot more competition in the observability space now.

Martin Mao: There was still a lot of competition back then, but different types of companies. Tools like AppDynamics and New Relic were very popular. Even Datadog was a series C company when we were looking at this problem space. There were many solutions, but none were targeting containerized environments. In 2014, when we were solving this at Uber, the majority of the market had not containerized. It was pre-Kubernetes becoming the de facto platform. Most folks were running on VMs, and an APM-style piece of software was probably the right solution.

Nataraj: You mentioned open source. Was this the M3 database that you open-sourced?

Martin Mao: Yes, it was multiple solutions. One was M3, the backend, which was a time-series database great for storing metric-based data. Jaeger, for distributed tracing, was created by the same team and is a CNCF project today. We also open-sourced various clients and other pieces.

Acquiring the First Five Customers

Nataraj: So you saw a gap in the market and decided to start the company. What were those initial days like? Talk to me about getting your first five customers.

Martin Mao: We saw the gap in the market later, around 2018-2019, especially after KubeCon in Seattle when all the major cloud providers announced they were going all-in on Kubernetes. It was only then that we realized there was a real gap in the broader market. In the beginning, it was quite difficult. Just like every other startup, nobody knew who we were. There was no brand recognition. For the first one or two customers, there was a bit of trust because we had worked with people at those companies when we were at Uber. They knew us as the observability team at Uber and had used the technology before, which gave us some credibility. Honestly, the rest was just typical outbound efforts. I was on LinkedIn every day sending 500 messages to various VPs and CEOs, saying, ‘Hey, this is us, this is the problem we’re trying to solve. Can I get you on a call?’ A lot of outbound emails and messages to get those opportunities.

Nataraj: Observability is mission-critical, used to find and fix live issues. It must be hard to convince a company to adopt a new mission-critical technical product. Were your initial customers transitioning to Kubernetes and saw it as a good time to test a new solution?

Martin Mao: Initially, it was a lot of companies that had already transitioned. These were tech-forward companies running mostly containerized environments at scale in 2019-2020. Being mission-critical probably didn’t help us as a startup. You’re trying to convince a company to replace a mission-critical piece of software they’re likely purchasing from a big public vendor with a well-known brand name. As a one or two-year-old startup, the benefit of switching had to be so large that it would outweigh the risk. For us, early on, the benefit was on the scale and performance of the backend, but also on cost efficiency. It was so much more cost-efficient than other solutions. We’re not talking 20% more cost-efficient; we’re talking four to five times more cost-efficient. The gap had to be very large.

The Chronosphere Platform: Differentiating on Cost and Capability

Nataraj: Can you give a high-level overview of the products Chronosphere offers today and talk a bit about the business model?

Martin Mao: We offer two products. One is our observability platform, which can ingest and store logs, metrics, traces, and events from your infrastructure and applications. We then provide analytics capabilities on top to help you debug issues. Compared to others, it differentiates in two main ways. The first is cost efficiency. We realized there’s a lot of waste in observability; you store and pay for a lot of data you may not need. Most observability companies charge you for the more data you produce, so they aren’t motivated to help you reduce it. As a disruptor, we had to do something different. We created features that show the customer what is and isn’t useful, giving them tools to optimize the data so they only pay for what’s useful. This not only reduces costs but guarantees that every dollar is well spent.

The second differentiator is that you need a different tool optimized for modern environments, where the probable cause of an issue is a downstream dependency, a new rollout, or a feature flag change. Our platform looks for those changes and correlates them with issues. Our customers have found they reduce their time to detect and resolve problems by around 65%.

Separately, we have a solution called an observability telemetry pipeline. You can install this in your environment in front of an existing tool like Splunk or Elastic. It can route and transform the data it collects to those backends, but it can also reduce and optimize data volumes. For instance, you can route subsets of data to cold storage like S3 to reduce costs. You don’t have to use it with our observability platform, but it provides a similar benefit without a full migration.

Nataraj: So customers using competitors’ observability products think about cost predictability?

Martin Mao: In the last two to three years, as the economy has changed, they care about it a lot. It’s not just the absolute dollar amount. Our customers ask what fraction of their revenue or operating expense is spent on observability. The predictability and knowing the relative percentage of cost matters. If your business grows 2X, but your observability costs grow 3X, that’s a bad efficiency model. Being able to see and control that is key. We provide tools that show them where their spend is going and how data is being used, giving them the ability to make decisions and stay within their budget.

Competing in a Crowded Ecosystem

Nataraj: All the big three clouds—AWS, Azure, Google—have their own observability products like CloudWatch and Azure Monitor. How do you compete with them, especially with bundled pricing advantages?

Martin Mao: I look at this in a few ways. First, what’s unique about observability is that it’s meant to tell you if your infrastructure is up or down. If your observability service runs on the same infrastructure you’re monitoring, there’s a problem. For example, AWS’s observability services depend on S3 and Kinesis. When S3 goes down in a region, your infrastructure is likely impacted, but the thing meant to tell you that is also down. It’s in that moment you need observability the most. There’s a huge advantage in decoupling your observability from the infrastructure it monitors. Our architecture is purposely single-tenanted, allowing us to ensure we are not on the same public cloud infrastructure as our customers.

Another angle is that cloud providers are really good at providing building blocks—the underlying infrastructure—but historically less great at building end-to-end SaaS products. Their observability services are decent for storage, but they lack advanced capabilities for data efficiency, root cause analysis, or anomaly detection. If you look at the leaders in the observability market—Chronosphere, Splunk, Datadog—none are cloud providers. To compete, you need to differentiate on the product side, not just on underlying storage and unit economics, because you’ll likely lose that game against the cloud providers.

Product Philosophy: Building for the Bleeding Edge

Nataraj: What’s your philosophy on deciding what to build next?

Martin Mao: We listen a lot to our customers. Tech-forward companies are generally containerizing first and doing it at scale, so we get to work with companies at the bleeding edge of their technology stack. They are constantly pushing us on what’s next and inform a lot of our innovation. Targeting early adopters gives you significant input on product innovation, versus targeting the laggards or the majority. We’re lucky that we target innovators and tech-forward companies who provide us with a lot of input.

Nataraj: Who are some of these tech-forward customers today?

Martin Mao: When we first started, it was large, digital-native companies like DoorDash, Robinhood, and Affirm—companies that grew up in the 2010s in the public cloud. They were the first to containerize and were pushing technology. Today, we see more of the majority of the market containerizing. Big enterprises like JP Morgan Chase, American Airlines, and Visa are containerizing at a large scale, often because they have a hybrid and multi-cloud strategy. If you have two or three different pieces of infrastructure, you need a common layer like Kubernetes to avoid implementing your infrastructure three times. Now, we see a lot more demand from those companies. And of course, the latest are the AI companies. Everyone starting an AI company today is running on modern, containerized infrastructure from day one, which is our sweet spot.

Observability in the Age of AI

Nataraj: You mentioned AI. How does observability change for AI companies, especially for LLM-based applications?

Martin Mao: We noticed that even with LLM technologies, you still have application logic and CPU-based workloads. But it added new use cases, like monitoring GPUs for inferencing. At the infrastructure level, monitoring a GPU cluster isn’t too different from a CPU cluster. As you go up the stack, we found that the basic observability data types—metrics, distributed traces, and logs—still map very well for debugging what’s happening in an LLM application. Because the data types map nicely, the features and tools we’ve built work quite well for these new apps. So far, we haven’t had to create a new solution; it’s just been more data and more use cases.

Nataraj: How are you thinking about leveraging AI for your own product?

Martin Mao: We’ve been playing around with it a lot. Initially, like everyone else, we put an LLM trained on our docs to create a chatbot. But we found that a lot of our data is numerical or unstructured in a way that’s not typical for LLMs. When we try to apply a foundational model to the raw observability data, it’s not very effective because it wasn’t trained on it, and this data is unique to each company. However, for years, we’ve been building knowledge graphs and structuring this data to power our analytics engine. When you feed these structured knowledge graphs into the models, they become much more effective. We were lucky to have already been doing the hard work of data scrubbing and normalization for our product, and now it’s beneficial for AI models. Still, I’m not sure a chat interface is the right starting point for observability. When you get paged, a visual interface with graphs feels more natural than a chat box asking, ‘Tell me what’s wrong’.

Founder Reflections

Nataraj: We’re almost at the end of our conversation. What do you know about starting a company that you wish you knew earlier?

Martin Mao: Early in my career, I assumed that to be a CEO, you needed an MBA and executive experience. I found that not to be true. I don’t have an MBA or experience as a big executive. I was an engineering manager at Uber before this. There’s probably less of a barrier for someone to become a founder and CEO than one might think from the outside.

Nataraj: What are you consuming right now that’s influencing your thinking? It can be books, audio, or video.

Martin Mao: A lot of conference talks, especially on AI-related topics where things are evolving so fast. By the time a book comes out, it might be outdated. So, things like podcasts and conference talks are better for accessing what’s happening live. Historically, even a research paper takes a while to be released, and a book takes even longer.

Nataraj: Martin, thanks for coming on the show and looking forward to what Chronosphere does in the future.

Martin Mao: Thank you. Thanks for having me. I enjoyed the conversation, and hopefully, we can do this again sometime.

Conclusion

Martin Mao’s journey with Chronosphere offers a compelling look into solving complex technical challenges born from real-world, large-scale operations. His insights on product differentiation, customer acquisition in a mission-critical space, and the evolving landscape of AI-driven observability provide valuable lessons for founders and engineers.

→ If you enjoyed this conversation with Martin Mao, listen to the full episode here on Spotify, Apple, or YouTube.

→ Subscribe to ourNewsletter and never miss an update.

How Chronosphere’s Founder Solved Uber’s Observability Crisis

The Challenge of Modern Observability

The Genesis of Chronosphere at Uber

Acquiring the First Five Customers

The Chronosphere Platform: Differentiating on Cost and Capability

Competing in a Crowded Ecosystem

Product Philosophy: Building for the Bleeding Edge

Observability in the Age of AI

Founder Reflections

Conclusion

More posts

Joseph Krause on Radical AI & the Future of Materials Discovery

How AI Is Unlocking Materials We’ve Never Been Able to Build | Radical AI

Statsig Founder Vijaye Raji on Building a Data-Driven Platform

Molham Aref on Building RelationalAI: An AI Coprocessor for Snowflake