"Build vs. Buy AI Capability" When 90% of Code Is Already AI-Generated?
A Product Strategy question -How Would You Answer "Build vs. Buy AI Capability" When 90% of Code Is Already AI-Generated?
Dear readers,
Thank you for being part of our growing community. Here’s what’s new this today,
AI Product Management Interview Question:
Q: Your company needs a document understanding capability for its enterprise product. Foundation model APIs exist that could cover 80% of the use case today, and the engineering team says they could build something comparable in six weeks using AI coding tools. Should you build or buy?
Note: This post is for our Paid Subscribers, If you haven’t subscribed yet,
Step 1: Ask Clarifying Questions
Before jumping into the answer, I want to make sure I understand what is actually being decided here, because the framing of the question determines the entire strategic analysis.
Q: When you say “document understanding capability,” are we talking about raw document parsing (OCR, text extraction, layout detection), or a higher-order capability like understanding document meaning, extracting structured data, or learning from user corrections over time?
Let us assume it is a higher-order capability: extracting structured data from unstructured documents and improving accuracy based on user feedback over time.
Q: Is this capability on the core differentiation surface of our product, or is it table-stakes infrastructure that enables the core experience?
Let us say it is adjacent to the core. Our product differentiates on the workflow built around document understanding, but the accuracy and learning from user corrections are what make the workflow defensible.
Q: What is the competitive landscape? Are our direct competitors already offering this capability, and if so, are they building or buying it?
Two competitors have shipped similar features in the past six months. At least one appears to be using a third-party API based on publicly available integration documentation.
Q: Do we have proprietary training data, specifically labeled examples from our own users, that could improve a model beyond what a general-purpose API provides out of the box?
Yes. We have 18 months of user correction data across 200,000 documents.
Q: What is the team’s current AI/ML capability? Do we have engineers who can fine-tune models and maintain inference infrastructure, or would we need to hire?
We have two ML engineers and strong full-stack developers who are already using Cursor and GitHub Copilot extensively.
Interview Tip: The second clarifying question, about whether the capability sits on the core differentiation surface, is the one the interviewer is waiting for. That single judgment call tells them almost everything about your strategic maturity. A candidate who dives straight into cost analysis without first establishing whether the capability is core or commodity has signaled that they think like a project manager, not a product strategist.
Step 2: Establish the New Strategic Context
Before proposing an answer, I need to name the structural shift that makes this question different in 2026 than it was in 2022, because the interviewer is testing whether I understand it.
The Build-Cost Compression
AI-assisted development tools have fundamentally altered the cost side of the build-vs-buy equation. GitHub Copilot now has over 20 million users globally and is deployed at 90% of Fortune 100 companies. Developers using these tools complete coding tasks up to 55% faster in controlled experiments, and AI-generated code now accounts for roughly 46% of all code written by active users. Pull request cycle times have dropped from an average of 9.6 days to 2.4 days in enterprise deployments. The practical effect: what used to take four months of engineering can now often be shipped in six weeks.
This compression does not make “build” the default answer. It makes the old framework for deciding between build and buy obsolete. When building was expensive and slow, buying was the safe default for anything outside the core product. When building is fast and relatively cheap, the decision shifts from an economic question to a strategic identity question: what capabilities define who you are as a product, and which are interchangeable commodities?
The API Commoditization Trap
Simultaneously, foundation model APIs have created a new risk that the old framework did not account for. When you buy access to a general-purpose AI capability from any major provider, every competitor can access the same capability at the same price. The API itself offers zero differentiation. A PM candidate who notes that the API cost is lower than six weeks of engineering time has done arithmetic. A PM candidate who recognizes that buying means their product capability is identical to every competitor using the same API has done strategy.
The Data Flywheel Asymmetry
The most powerful argument for building is not cost or control. It is the opportunity to generate proprietary training signal. Every user interaction with a capability you built in-house is a data point that can improve your model. Every user interaction with a capability you bought generates that same data for the vendor, not for you. This asymmetry compounds over time in ways that most candidates never articulate.
The key insight: The build-vs-buy question in 2026 has three dimensions, not two. It is no longer “build or buy.” It is “build, buy, or fine-tune.” The middle path, where you buy a base model and fine-tune it on your proprietary data, changes the strategic conversation entirely. Mentioning this option and articulating when it makes more sense than either pure endpoint tells the interviewer you understand the actual texture of modern AI product development.
Step 3: Apply the Strategic Capability Mapping Framework
I will organize my answer using a framework I call Strategic Capability Mapping, which has three components delivered in sequence: capability classification, differentiation horizon analysis, and reversibility assessment.
Component 1: Capability Classification
1. Sort the capability into one of three buckets
Commodity infrastructure: Raw document parsing, OCR, basic text extraction. Every major cloud provider offers this. There is no competitive advantage in building it yourself. Buy without guilt and move fast.
Competitive table stakes: Structured data extraction from common document types (invoices, contracts, receipts). Multiple vendors offer this. Your competitors likely have it. You need it to be in the market, but it alone does not win deals. Lean toward buying unless your accuracy requirements are significantly higher than what off-the-shelf provides.
Core differentiation: Domain-specific extraction that learns from your users’ correction patterns and improves over time on their specific document types. No vendor will ever optimize this capability for your specific users as well as you can, because no vendor has your proprietary feedback loops. Build this.
2. Apply to our scenario
Based on the clarifying questions, our capability sits at the boundary between table stakes and core differentiation. The raw extraction is table stakes. The learning-from-corrections layer that uses our 200,000-document feedback dataset is core differentiation. This means the answer is not a binary build-or-buy. It is a hybrid: buy the base extraction capability (or use a foundation model API for it), and build the learning layer on top of it using our proprietary data.
Interview Tip: This hybrid answer is exactly what interviewers at AI-native companies want to hear. It demonstrates that you understand the spectrum between pure build and pure buy, and that you can place a capability precisely on that spectrum rather than defaulting to one end. At companies like Anthropic, interviewers have been reported to probe specifically for whether candidates distinguish between capability commoditization and data differentiation.
Component 2: Differentiation Horizon Analysis
3. Ask the temporal question
Will this capability still be a differentiator in eighteen months, or will it be fully commoditized by then? The answer determines how much you should invest in building.
Raw document extraction is already commoditized. Investing engineering cycles in building custom OCR in 2026 would be repeating the mistake several companies made before multimodal models made their custom pipelines obsolete within a quarter.
However, domain-specific learning from user corrections is not on a commoditization trajectory. It depends on proprietary data that only you have. The more your model learns from your users, the wider the gap becomes between your capability and what any API can provide. This is a compounding advantage, not a depreciating one.
The decision principle: invest build effort only in capabilities where the differentiation gap widens over time, not narrows.
Component 3: Reversibility Assessment
4. Map the switching costs in both directions
If we buy now and need to switch later: What happens if the vendor changes pricing, degrades quality in a model update, or pivots their product direction? Foundation model providers have been known to silently update model behavior, causing downstream product breakages for enterprise customers who built tightly coupled workflows on top of the API. The switching cost depends on how tightly coupled our integration is.
If we build now and the market moves: How much of the engineering investment is salvageable? If we build a modular integration layer with clean abstraction between the base extraction and the learning layer, most of the investment survives even if we swap out the base model underneath.
The mitigation: regardless of whether we build or buy the base extraction, architect the system with a clean abstraction layer so that the base model can be swapped without touching the learning layer. This is not just good engineering. It is strategic optionality that protects the investment either way.
Step 4: Deliver the Recommendation
Based on the Strategic Capability Mapping framework, my recommendation for this scenario is a hybrid approach:



