My PM Interview® - Preparation for Success

My PM Interview® - Preparation for Success

AI and Machine Learning Concepts - Part 3

AI Product Management: Generative AI, Large Language Models, Agentic AI, NLP, Model Optimization, and AI Safety

My PM Interview's avatar
My PM Interview
Mar 11, 2026
∙ Paid

Dear readers,

Thank you for being part of our growing community. Here’s what’s new this today,

AI Product Management:

AI and Machine Learning Concepts - Part 3 (Generative AI, Large Language Models, Agentic AI, NLP, Model Optimisation, and AI Safety)

Note: This post is for our Paid Subscribers, If you haven’t subscribed yet,

Claim Exclusive Discount & Unlock Access

Table of Contents

  1. Generative AI Foundations

    1. Large Language Models (LLMs)

    2. Small Language Models (SLMs)

    3. Foundation Models

    4. Diffusion Models

    5. Multimodal AI

  2. LLM Core Concepts

    1. Context Window

    2. Temperature

    3. Inference and Latency 2.4 Grounding and Groundedness

  3. AI Alignment and Safety

    1. RLHF (Reinforcement Learning from Human Feedback)

    2. DPO (Direct Preference Optimization)

    3. Constitutional AI

    4. AI Guardrails

    5. AI Red Teaming

    6. Human-in-the-Loop (HITL)

  4. Agentic AI and AI Agents

    1. Tool Use and Function Calling

    2. Chain-of-Thought (CoT) Reasoning

    3. Multi-Agent Systems

    4. ReAct (Reasoning + Acting)

  5. Natural Language Processing (NLP) Concepts

  6. Model Optimization and Efficiency

    1. Quantization

    2. Knowledge Distillation

    3. LoRA and PEFT (Parameter-Efficient Fine-Tuning)

    4. Model Pruning

    5. Mixture of Experts (MoE)

  7. Data Concepts in AI

  8. AI Ethics, Governance, and Regulation

  9. AI Infrastructure and Deployment

  10. Emerging and Frontier AI Concepts

  11. Rapid-Reference Glossary

Share


GenAI Foundations

Generative AI is the branch of artificial intelligence focused on creating new content, whether text, images, audio, video, or code, rather than just analyzing or classifying existing data. While Part 2 covered the neural network architectures that power GenAI (Transformers, GANs, Autoencoders), this section covers the ecosystem of concepts, models, and techniques that define the modern generative AI landscape.

1. Large Language Models (LLMs)

LLMs are massive neural networks (typically Transformer-based) trained on enormous text datasets to understand and generate human language. They have billions or even trillions of parameters. Examples include GPT-5, Claude, Gemini, LLaMA, and Mistral. LLMs are the engine behind most modern AI products, from chatbots to code assistants to search engines.

Example: Building AI Features

When your product roadmap includes ‘Add AI-powered summarization,’ you are essentially choosing to integrate an LLM. As a PM, you need to decide: Use an external API (OpenAI, Anthropic) or host an open-source model (LLaMA, Mistral)? API calls are simpler but create vendor dependency and per-token costs. Self-hosted models require infrastructure but give you control over data privacy, latency, and customization.

2. Small Language Models (SLMs)

SLMs are compact language models (typically under 3 billion parameters) designed to run efficiently on edge devices, mobile phones, or low-cost servers. Examples include Phi-4, Gemma, and TinyLlama. They sacrifice some capability for dramatic reductions in cost, latency, and hardware requirements.

Example

Your mobile app needs an on-device AI feature that works offline (e.g., grammar checking in a note-taking app). An SLM running locally on the phone provides instant responses without any network latency or API costs, and user data never leaves their device. The tradeoff is that the model handles simpler tasks well but cannot match the reasoning depth of a full-sized LLM.

3. Foundation Models

A Foundation Model is a large AI model trained on broad data at scale that can be adapted (via fine-tuning, prompting, or RAG) to a wide range of downstream tasks. The term emphasizes that one base model serves as the foundation for many applications. GPT-4, Claude, and Gemini are foundation models. So are image models like Stable Diffusion and multimodal models like GPT-4o.

Example

Instead of building separate ML models for customer support, content generation, data analysis, and code review, your team uses a single foundation model adapted for each use case through different system prompts and RAG configurations. One model, four products. This dramatically reduces your ML infrastructure complexity.

4. Diffusion Models

Diffusion Models generate data by learning to reverse a gradual noising process. During training, the model learns to add noise to data step by step until it becomes pure noise, then learns to reverse that process, starting from noise and progressively refining it into a clean output. This is the technology behind image generators like Stable Diffusion, DALL-E 3, and Midjourney, and video generators like Sora.

Everyday Analogy

Imagine a sculptor who learns by watching a statue slowly dissolve into a pile of dust (forward process). Once they understand how each detail erodes at each stage, they can reverse the process: start with a pile of dust and reconstruct the statue layer by layer (reverse process). Diffusion models do exactly this with pixels.

Example: AI-Generated Marketing Assets

Your marketing team needs 50 product lifestyle images for a campaign. A photographer shoot costs $15,000 and takes 2 weeks. Using a diffusion model fine-tuned on your brand assets, the team generates photorealistic images in hours for a fraction of the cost. As PM, you evaluate the quality-vs-cost tradeoff and build a human review step for brand consistency.

5. Multimodal AI

Multimodal AI systems can process and generate content across multiple data types (modalities) simultaneously: text, images, audio, video, and code. Examples include GPT-4o (text + images + audio), Gemini (text + images + video + code), and Claude (text + images + code). This contrasts with unimodal models that handle only one type of data.

Example

Your customer support tool receives a screenshot of an error message, a voice note describing the issue, and a text description. A multimodal AI processes all three inputs together, cross-referencing the visual error code with the spoken context and written details to generate a comprehensive diagnosis and solution. No need to build three separate pipelines.


LLM Core Concepts

1. Context Window

The context window is the maximum amount of text (measured in tokens) that an LLM can process in a single interaction. It includes everything: the system prompt, conversation history, any retrieved documents, the user’s question, and the model’s response. Once you exceed the context window, the model literally cannot see the information.

Example

Your AI document analysis feature lets users upload contracts for review. The model’s context window is 128,000 tokens (roughly 200 pages). A 50-page contract fits easily, but if the user also uploads 10 supporting documents totaling 300 pages, you exceed the window. As PM, you design chunking strategies, prioritization logic, or select a model with a larger context window.

2. Temperature

Temperature is a parameter that controls randomness in an LLM’s output. A temperature of 0 makes the model deterministic (always choosing the most probable next token). Higher temperatures (0.7 to 1.0) increase creativity and variety. Very high temperatures (above 1.5) produce chaotic, often incoherent output.

Example

For your product’s AI copywriting feature, you set temperature to 0.8 for creative brainstorming (diverse, surprising ideas). For the contract summarization feature, you set it to 0.1 (precise, predictable, factual). For code generation, you use 0.2 (correct and consistent). Temperature is one of the most impactful product configuration choices you can make.

2.3 Inference and Latency

Inference is the process of using a trained model to generate predictions or outputs on new data. Latency is the time delay between sending a request and receiving the response. For LLMs, inference latency is measured in tokens per second (how fast the model generates output) and time to first token (TTFT), which is how long users wait before seeing any response.

Example

Your AI chatbot takes 8 seconds to start responding (high TTFT). Users perceive this as broken and leave. You switch to streaming (tokens appear as they are generated) and the TTFT drops to 0.5 seconds. Users now see the response building in real time, even though the total generation time is the same. Streaming is a PM decision, not just an engineering one.

2.4 Grounding and Groundedness

Grounding means connecting an AI model’s output to verifiable sources of truth (your database, documents, knowledge base, or real-time data). Groundedness measures whether the model’s response is supported by the provided context rather than fabricated. RAG (covered in Part 2) is the primary grounding technique. Grounding is the main defense against hallucination.

Example

Your enterprise AI assistant answers questions about company policies. Without grounding, the model might hallucinate a vacation policy that does not exist. With grounding (RAG pulling from your official HR documents), every answer is traceable to a source document. You can even show users the exact paragraph the answer came from, building trust.


AI Alignment and Safety

Alignment is the challenge of ensuring AI systems behave in ways that are helpful, honest, and harmless, consistent with human values and intentions. As AI systems become more capable, alignment becomes one of the most critical fields in AI.

1. RLHF (Reinforcement Learning from Human Feedback)

RLHF is a training technique where human evaluators rank different model outputs, and a reward model is trained on these rankings. The LLM is then fine-tuned using reinforcement learning to maximize the reward model’s score. This is the primary method used to make LLMs helpful, safe, and aligned with human preferences. It is the technique that transformed GPT-3 (a raw text predictor) into ChatGPT (a helpful assistant).

Everyday Analogy

Imagine training a new employee. Instead of giving them a rulebook for every possible situation, you have experienced managers review their work samples and rank them from best to worst. Over time, the employee internalizes what ‘good work’ looks like based on these rankings. RLHF works the same way: human preferences shape the model’s behavior.

2. DPO (Direct Preference Optimization)

DPO is a simpler, more efficient alternative to RLHF. Instead of training a separate reward model and then using RL, DPO directly optimizes the language model using pairs of preferred and rejected outputs. It achieves comparable alignment quality with significantly less computational complexity and training instability.

Example

Your team is fine-tuning a model for customer-facing responses. RLHF requires training a reward model plus RL optimization (complex, expensive). DPO lets you collect pairs of ‘better response’ vs. ‘worse response’ and directly train the model on these preferences. Same quality alignment, 40% less compute cost, and simpler implementation for your ML team.

3. Constitutional AI

Constitutional AI (developed by Anthropic) gives the model a set of principles (a ‘constitution’) and trains it to self-critique and revise its own outputs according to those principles. Instead of relying entirely on human feedback for every edge case, the model learns to evaluate whether its responses align with stated values like helpfulness, honesty, and harmlessness.

Example

You are deploying an AI assistant for a healthcare platform. Constitutional AI principles might include: ‘Always recommend consulting a doctor for medical decisions,’ ‘Never provide specific dosage recommendations,’ and ‘If uncertain, clearly state the limitation.’ The model internalizes these constraints and self-corrects before responding, reducing the need for post-hoc content filtering.

4. AI Guardrails

Guardrails are safety mechanisms built around AI systems to prevent harmful, off-topic, or undesirable outputs. They can be input guardrails (filtering dangerous prompts before they reach the model), output guardrails (checking responses before showing them to users), or topic guardrails (keeping the AI within its defined scope).

Example

Your financial advisory chatbot should never give specific stock picks. You implement: (1) an input filter that detects ‘should I buy X stock’ patterns, (2) a system prompt that instructs the model to provide educational information only, and (3) an output filter that scans responses for specific ticker recommendations. Three layers of guardrails ensure regulatory compliance.

5. AI Red Teaming

Red teaming is the practice of deliberately attempting to break, mislead, or extract harmful outputs from an AI system. Teams of testers act as adversaries, probing the model with edge cases, adversarial prompts, and creative attacks to find vulnerabilities before users do. It is the AI equivalent of penetration testing in cybersecurity.

Example

Before launching your AI customer service agent, your red team tests: ‘Can it be tricked into revealing internal pricing strategies?’ ‘Can prompt injection make it ignore its system instructions?’ ‘Does it handle offensive language gracefully?’ Every vulnerability found during red teaming is one fewer crisis in production.

6. Human-in-the-Loop (HITL)

HITL is a design pattern where humans remain part of the AI decision-making process, reviewing, approving, or correcting AI outputs before they take effect. It is essential for high-stakes applications where AI errors carry significant consequences.

Example

Your AI drafts responses to customer complaints, but a human agent reviews and edits each response before it is sent. Over time, as the model improves and confidence scores rise, you gradually automate more responses (moving from ‘human reviews all’ to ‘human reviews flagged ones’). This staged autonomy approach manages risk while capturing efficiency gains.


Agentic AI and AI Agents

User's avatar

Continue reading this post for free, courtesy of My PM Interview.

Or purchase a paid subscription.
© 2026 PREPTERVIEW EDU SOLUTIONS PRIVATE LIMITED · Publisher Privacy ∙ Publisher Terms
Substack · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture