My PM Interview® - Preparation for Success

My PM Interview® - Preparation for Success

What are the latest techniques to customize or fine-tune LLMs?

AI Product Management Interview Question - What are the latest techniques to customize/fine-tune LLMs? Explain one in detail to your leader, who knows very little about ML/AI.

My PM Interview's avatar
My PM Interview
Feb 21, 2026
∙ Paid

Dear readers,

Thank you for being part of our growing community. Here’s what’s new this today,

AI Product Management - What are the latest techniques to customise or fine-tune LLMs?

Note: This post is for our Paid Subscribers, If you haven’t subscribed yet,

Claim Exclusive Discount & Unlock Access

Modern large language models are powerful but not one-size-fits-all. You can either change the model itself or give it smarter ways to use your company’s knowledge. The fastest, most cost-effective approach for most product teams is to start with retrieval-based methods that connect a general model to your documents, then add lightweight tuning if you need a persistent style or behaviour change.

  • If your priority is accurate, up-to-date answers from company data, start with Retrieval-Augmented Generation (RAG).

  • If your priority is a consistent voice, structured outputs, or specialized reasoning that must live inside the model, consider a parameter-efficient fine-tuning method like LoRA or adapters.

This will help in,

  • Faster time-to-value: You can build useful experiences without months of model training.

  • Lower engineering cost: Most teams can deliver a helpful product with retrieval, smaller tuning, and iterative improvements.

  • Safer rollouts: Grounded answers and clear provenance make compliance and product trust easier.


Goals and Decision Criteria

Before you pick a technique, align on the business goals and constraints. The right choice depends on what you value most: freshness of information, tone and behaviour, cost, latency, or regulation.

Key decision criteria:

  • Business outcome

    • What do we need the model to do for users?

      Different approaches change either the model’s knowledge, its behavior, or both. Be concrete: answer support tickets, draft legal text, summarize calls, or autocomplete code?

  • Freshness of knowledge

    • Does the content change often (pricing, legal rules, product docs)?

      If knowledge changes frequently, RAG or live retrieval is usually best because you can update content without retraining.

  • Required style and behaviour

    • Do responses need a persistent tone, strict templates, or highly structured outputs?

      Fine-tuning or PEFT methods are better when you need the model to always write in a particular way.

  • Accuracy versus creativity

    • Is it worse for the model to be wrong or to be blunt/boring?

      Grounding with retrieval reduces hallucinations; reinforcement learning or instruction tuning can make the model better at following desired formats.

  • Latency and user experience

    • Do we need instant answers, or is a 1-2 second backend fetch acceptable?

      Retrieval adds retrieval latency but reduces downstream risk; heavy model tweaking can affect inference performance.

  • Cost and engineering effort

    • What budget and team skills are available (ML engineers, infra, legal)?

      Full model fine-tuning is resource heavy. PEFT methods and RAG reduce compute and infra burden.

  • Privacy and compliance

    • Are we exposing or moving sensitive data?

      Retrieval pipelines must have strict access control and audit trails. For some regulated uses, you may need on-prem or isolated infra.


Modern Techniques for Customising and Fine-Tuning LLMs

When we talk about “customising an LLM,” we’re really answering one core question:

Are we changing the model’s brain, or are we changing how it accesses and uses information?

Different techniques solve different product problems. Below is a detailed, PM-friendly explanation of each modern approach, including when to use it, risks, cost profile, and practical product implications.


Share

1. Full Fine-Tuning

Full fine-tuning retrains all model parameters using your domain-specific data. You are literally updating the model’s internal weights so it internalises new knowledge or behaviour.

Think of it as re-educating the entire brain.

What it changes

  • Knowledge representation

  • Reasoning patterns

  • Style and tone

  • Task-specific behavior

When it’s useful

  • You need deep domain reasoning baked into the model.

  • Your use case requires consistent and persistent behavior.

  • You have large volumes of high-quality labeled data.

  • You can afford serious compute and ML engineering resources.

Example

A legal-tech company trains a model on tens of thousands of case documents so it reasons like a specialist lawyer, not just retrieves information.

Trade-offs

  • Expensive in compute and infrastructure.

  • Long iteration cycles.

  • Risk of overfitting.

  • Harder rollback if something goes wrong.

  • Requires ML engineering maturity.

This is powerful but heavy. Most startups and internal teams should not start here.


2. LoRA (Low-Rank Adaptation)

LoRA modifies only small, low-rank matrices inside the model instead of changing all parameters.

Instead of retraining the entire brain, you insert small “behaviour adjustment modules.”

Why it matters

  • Much cheaper than full fine-tuning.

  • Faster to train.

  • Requires less compute.

  • Easier to deploy and experiment with.

What it’s good for

  • Changing tone or response style.

  • Improving performance on a specific domain task.

  • Structured output behavior.

  • Improving formatting consistency.

Example

You want your support assistant to:

  • Always respond in bullet points.

  • Keep answers under 150 words.

  • Follow a strict template.

LoRA can make this consistent across interactions.

Trade-offs

  • Does not add new live knowledge.

  • Still requires ML infra.

  • Slight increase in inference complexity.

LoRA is often the practical middle ground if you need durable behavior changes without enterprise-level ML budgets.


3. PEFT (Parameter-Efficient Fine-Tuning)

User's avatar

Continue reading this post for free, courtesy of My PM Interview.

Or purchase a paid subscription.
© 2026 PREPTERVIEW EDU SOLUTIONS PRIVATE LIMITED · Publisher Privacy ∙ Publisher Terms
Substack · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture