My PM Interview® - Preparation for Success

My PM Interview® - Preparation for Success

Designing Conversational AI Systems

Technical approaches, tradeoffs, and decision frameworks to build scalable and reliable AI experiences

My PM Interview's avatar
My PM Interview
Jan 27, 2026
∙ Paid

Share

This would help product managers choose the right back-end architecture for conversational AI. It explains the three common approaches, shows the tradeoffs that affect cost and quality, and gives clear signals for when to pick each approach.

Framework,

  • Traditional NLU systems - Use when you need strict control, low per-interaction cost, and predictable behavior for a limited set of tasks.

  • Standalone LLM chatbots - Use when you want rapid prototyping, wide coverage, or natural, human-like answers and you can tolerate higher cost and occasional inaccuracies.

  • Hybrid systems - Use when you need a balance: programmatic control for core flows plus generative help for edge cases or complex queries.


Importance of Architecture

Choosing an architecture is one of the highest-impact technical decisions for a conversational product. It is not only a technology choice but a product economics and risk choice.

Here are the core ways your architecture affects outcomes:

  1. Cost at scale

  • Some approaches charge per model call and can become expensive when usage grows. Others require heavier engineering upfront but are cheaper per interaction.

  • Example: a simple intent-answer bot can be cheap to run for millions of requests. A general-purpose LLM that generates text every turn will have a much higher per-conversation bill.

  1. Quality and predictability

  • Architectures that use hand-designed flows give predictable answers and make compliance easier.

  • Generative models can handle unexpected phrasing and broad topics, but may produce plausible-sounding errors.

  1. Time to market and iteration speed

  • LLM prototypes are fast to ship because you often only need to write prompts and a little glue code.

  • NLU systems take time to design and test all dialogs, but once built they are stable and maintainable.

  1. Security, privacy, and compliance

  • Systems with explicit programmatic control are easier to audit and restrict. They are often required in regulated domains.

  • LLMs introduce new attack surfaces such as prompt injection and model memorization, so extra guardrails are needed.

  1. User experience and brand fit

  • If your product needs a friendly, conversational voice that adapts to lots of topics, generative models can add value.

  • If your product must always be accurate and concise, a rules-based or hybrid approach may be better.


Common misconceptions product teams make

  • Thinking generative models remove the need for design work. You still must design conversational flows, error states, and handoffs.

  • Believing that a single approach will be ideal forever. Products evolve and you might prototype with LLMs, then move to a hybrid or more controlled system as scale and requirements change.

  • Overlooking operational costs. Monitoring, retraining, and data pipelines create ongoing expenses beyond the obvious model bill.


Glossary of core terms

  • Natural Language Understanding (NLU)
    A rules-plus-model approach that classifies user intent and extracts specific data points (entities). Example: identifying that “book a flight to Mumbai next Monday” is a booking intent and extracting “Mumbai” and “next Monday”.

  • Large Language Model (LLM)
    A single, large neural model that can read prompts, track short-term context, and generate freeform text. Example: ChatGPT or Claude answering a user question in natural language.

  • Hybrid system
    An architecture that uses NLU or rules for structured steps and uses an LLM for freeform or fallback responses. Example: validate identity via rules, then let LLM craft nuanced explanations.

  • Retrieval-Augmented Generation (RAG)
    A technique that fetches relevant documents or data and includes them in the prompt to the LLM so answers are grounded in up-to-date facts. Example: retrieving a product manual paragraph to answer a troubleshooting query.

  • Fine-tuning
    Training an existing model further on domain-specific examples so it generates replies in a particular style or with specialized knowledge. Example: adapting a model to write in your brand voice.

  • ASR (automated speech recognition)
    Converts spoken audio to text so the conversational stack can interpret voice input.

  • TTS (text to speech)
    Converts text responses into natural-sounding audio for voice interfaces.

  • Intent
    The action the user wants to perform, like “check balance” or “schedule appointment”.

  • Entity
    A piece of structured information inside the user’s utterance, such as a date, location, or product name.

  • Dialog state
    The record of what has happened in the conversation so far, used to keep context across turns.

  • Prompt
    The input given to an LLM that instructs its behavior. It often includes a system-level instruction followed by the user query and any retrieved facts.

  • Hallucination
    When a generative model confidently provides incorrect or fabricated information. This is a key safety risk to monitor.


Architectural Patterns

This section breaks down the three core architectures you will choose between.

Traditional NLU systems


A structured, pipeline approach where separate components handle speech-to-text, intent classification, entity extraction, dialog state, business logic, and response rendering. Conversation flows are mostly authored and validated up front.

Key components

  • ASR (if voice): audio to text.

  • NLU: intent classifier and entity extractor.

  • Dialog manager: state tracking, turn logic, slot filling.

  • Business connectors: APIs to backend systems.

  • Response renderer: templated replies or simple text generation.

  • TTS (if voice): text to audio.

Strengths

  • Predictable, auditable behavior.

  • Low per-interaction compute cost after build.

  • Easier to certify for regulated workflows.

  • Simple failure modes and clear fallbacks.

Weaknesses

  • High upfront design and maintenance effort.

  • Brittle with unexpected phrasing or new user flows.

  • Hard to scale across many topics without lots of authoring.

Example

  • Bank IVR for balance checks and payments.

  • Appointment booking where steps are fixed.

  • High-volume FAQs with stable answers.


Standalone LLM chatbots

User's avatar

Continue reading this post for free, courtesy of My PM Interview.

Or purchase a paid subscription.
© 2026 PREPTERVIEW EDU SOLUTIONS PRIVATE LIMITED · Publisher Privacy ∙ Publisher Terms
Substack · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture