What are the latest techniques to customize or fine-tune LLMs?
AI Product Management Interview Question - What are the latest techniques to customize/fine-tune LLMs? Explain one in detail to your leader, who knows very little about ML/AI.
Dear readers,
Thank you for being part of our growing community. Here’s what’s new this today,
AI Product Management - What are the latest techniques to customise or fine-tune LLMs?
Note: This post is for our Paid Subscribers, If you haven’t subscribed yet,
Modern large language models are powerful but not one-size-fits-all. You can either change the model itself or give it smarter ways to use your company’s knowledge. The fastest, most cost-effective approach for most product teams is to start with retrieval-based methods that connect a general model to your documents, then add lightweight tuning if you need a persistent style or behaviour change.
If your priority is accurate, up-to-date answers from company data, start with Retrieval-Augmented Generation (RAG).
If your priority is a consistent voice, structured outputs, or specialized reasoning that must live inside the model, consider a parameter-efficient fine-tuning method like LoRA or adapters.
This will help in,
Faster time-to-value: You can build useful experiences without months of model training.
Lower engineering cost: Most teams can deliver a helpful product with retrieval, smaller tuning, and iterative improvements.
Safer rollouts: Grounded answers and clear provenance make compliance and product trust easier.
Goals and Decision Criteria
Before you pick a technique, align on the business goals and constraints. The right choice depends on what you value most: freshness of information, tone and behaviour, cost, latency, or regulation.
Key decision criteria:
Business outcome
What do we need the model to do for users?
Different approaches change either the model’s knowledge, its behavior, or both. Be concrete: answer support tickets, draft legal text, summarize calls, or autocomplete code?
Freshness of knowledge
Does the content change often (pricing, legal rules, product docs)?
If knowledge changes frequently, RAG or live retrieval is usually best because you can update content without retraining.
Required style and behaviour
Do responses need a persistent tone, strict templates, or highly structured outputs?
Fine-tuning or PEFT methods are better when you need the model to always write in a particular way.
Accuracy versus creativity
Is it worse for the model to be wrong or to be blunt/boring?
Grounding with retrieval reduces hallucinations; reinforcement learning or instruction tuning can make the model better at following desired formats.
Latency and user experience
Do we need instant answers, or is a 1-2 second backend fetch acceptable?
Retrieval adds retrieval latency but reduces downstream risk; heavy model tweaking can affect inference performance.
Cost and engineering effort
What budget and team skills are available (ML engineers, infra, legal)?
Full model fine-tuning is resource heavy. PEFT methods and RAG reduce compute and infra burden.
Privacy and compliance
Are we exposing or moving sensitive data?
Retrieval pipelines must have strict access control and audit trails. For some regulated uses, you may need on-prem or isolated infra.
Modern Techniques for Customising and Fine-Tuning LLMs
When we talk about “customising an LLM,” we’re really answering one core question:
Are we changing the model’s brain, or are we changing how it accesses and uses information?
Different techniques solve different product problems. Below is a detailed, PM-friendly explanation of each modern approach, including when to use it, risks, cost profile, and practical product implications.
1. Full Fine-Tuning
Full fine-tuning retrains all model parameters using your domain-specific data. You are literally updating the model’s internal weights so it internalises new knowledge or behaviour.
Think of it as re-educating the entire brain.
What it changes
Knowledge representation
Reasoning patterns
Style and tone
Task-specific behavior
When it’s useful
You need deep domain reasoning baked into the model.
Your use case requires consistent and persistent behavior.
You have large volumes of high-quality labeled data.
You can afford serious compute and ML engineering resources.
Example
A legal-tech company trains a model on tens of thousands of case documents so it reasons like a specialist lawyer, not just retrieves information.
Trade-offs
Expensive in compute and infrastructure.
Long iteration cycles.
Risk of overfitting.
Harder rollback if something goes wrong.
Requires ML engineering maturity.
This is powerful but heavy. Most startups and internal teams should not start here.
2. LoRA (Low-Rank Adaptation)
LoRA modifies only small, low-rank matrices inside the model instead of changing all parameters.
Instead of retraining the entire brain, you insert small “behaviour adjustment modules.”
Why it matters
Much cheaper than full fine-tuning.
Faster to train.
Requires less compute.
Easier to deploy and experiment with.
What it’s good for
Changing tone or response style.
Improving performance on a specific domain task.
Structured output behavior.
Improving formatting consistency.
Example
You want your support assistant to:
Always respond in bullet points.
Keep answers under 150 words.
Follow a strict template.
LoRA can make this consistent across interactions.
Trade-offs
Does not add new live knowledge.
Still requires ML infra.
Slight increase in inference complexity.
LoRA is often the practical middle ground if you need durable behavior changes without enterprise-level ML budgets.



