How Are B2B Companies Leveraging the Power of Generative AI in Their Products?

Large language models (LLMs) have vaulted from research labs into everyday products at a pace reminiscent of the early smartphone boom. For founders and product managers, the question is no longer whether to integrate generative AI but how to do so with speed, safety, and measurable ROI. To surface concrete answers, this post reverse‑engineers five companies—Notion, Stripe, Shopify, CapCut, and Yabble—that turned raw model outputs into shipping features, and distills the repeatable tactics that any team can borrow.

1. Notion AI: Shipping a Minimum Lovable Model

In November 2022 Notion unveiled a private alpha of Notion AI, built—literally—in a hotel room during an off‑site sprint. Ten weeks of tightly scoped feedback loops later, the product graduated to general availability for tens of millions of users. The team began with four narrowly defined jobs‑to‑be‑done: unblock writers, draft blog posts, transform text (translate, check grammar), and summarize content. Constraining the problem forced discipline: each feature could be mapped to a single prompt template, wrapped in the familiar Notion interface, and instrumented for telemetry. The lesson is clear—treat the LLM as a flexible back‑end, but package the experience like any other micro‑feature so you can measure, iterate, and, when necessary, discard.

2. Stripe: Running a 100‑Person Idea Blitz

Stripe took the opposite route on scope but employed the same bias for action. Days after GPT‑4’s release, leadership paused normal work and asked one hundred engineers to prototype anything that could make Stripe “magically better.” The hackathon produced three production features: (1) automatic summaries of a merchant’s business profile, (2) a chat bot that answers questions about Stripe’s deep technical documentation, and (3) an internal tool that flags potentially malicious accounts in Discord. Each solution replaced a slow, human‑in‑the‑loop process with an on‑demand language model call. The meta‑insight: suspension of business‑as‑usual can accelerate institutional learning; once the prototypes prove durable, they can be hardened and budgeted like any other microservice.

3. Shopify Magic: Treating AI as Horizontal Infrastructure

Shopify packaged its growing suite of generative features under the single label Shopify Magic, signaling to merchants that “AI is everywhere.” Text generation now helps store owners write product descriptions, email subject lines, blog posts, and headlines without leaving the admin panel. Media generation turns raw photos into polished product shots, while synthesized summaries distill hundreds of app‑store reviews into a paragraph. By sprinkling LLM endpoints across every text box in the product, Shopify reframes AI as a default capability, not an add‑on. For founders, the takeaway is to audit every user workflow for text manipulation tasks—each instance is a likely entry point for an intelligent co‑pilot.

4. CapCut: Compressing the Video‑Editing Funnel

CapCut, ByteDance’s consumer video editor, layers AI into three friction points: adding captions, assembling clips, and extracting short‑form highlights. One‑click subtitle generation eliminates manual transcription. A “text‑to‑video” tool produces draft footage from a short prompt, and a newer feature auto‑splices long videos into TikTok‑ready shorts that can be fine‑tuned on the timeline. CapCut’s pattern is incremental automation: start with a repetitive task, ship the 80‑percent solution, and surface controls so creators can steer the model. Visual products can emulate the same strategy—target the costly edge of the funnel where users churn, save them hours, then gradually expand AI coverage.

5. Yabble: Converting Qualitative Feedback into Structured Insight

Yabble ingests thousands of customer comments and, until recently, analysts had to tag themes by hand. By fine‑tuning GPT‑3, the start‑up now classifies sentiment and topics automatically, then clusters the output into decision‑ready dashboards. Translation, summarization, and thematic extraction form a single pipeline that used to consume days of analyst time. Yabble illustrates that mature LLM tooling often begins with internal analytics before it becomes a public feature—another low‑risk path for start‑ups that handle sensitive data.

Emerging Best Practices

Ship a thin vertical slice first. Constrain use cases to a handful of prompt templates that can be measured end‑to‑end. Early wins build political capital and surface failure modes quickly.
Exploit the text surface area. If your product asks the user to type, there is an opportunity to autocomplete, rewrite, translate, categorize, or summarize that text. These are high‑ROI, low‑latency calls that modern models handle with near‑human quality.
Instrument everything. Track acceptance rates, user edits, latency, and token costs. The telemetry will reveal which prompts need refinement and whether retrieval augmentation or fine‑tuning is worth the spend.
Keep humans in the loop. Every case study retained an “edit” affordance; AI drafts, humans approve. This reduces liability, maintains trust, and supplies data for continuous improvement.
Treat AI like any other microservice. Model choice, prompt orchestration, and guardrails should live behind a versioned API so they can be swapped without rewriting the UI.

Companies gain an edge not just by using large language models but by operationalizing them: quick, measurable pilots, focus on text bottlenecks, and a product architecture that treats intelligence as an interchangeable layer. Follow those principles and LLMs become less a moon‑shot and more a routine feature release—exactly how great software is built.

Nataraj is a Senior Product Manager at Microsoft Azure and the Author at Startup Project, featuring insights about building the next generation of enterprise technology products & businesses.

Listen to the latest insights from leaders building the next generation products on Spotify, Apple, Substack and YouTube.