Instruction Tuning

Pretraining teaches a language model to predict text. It does not directly teach the model to follow user instructions, answer safely, maintain dialogue structure, or format outputs in a useful way.

Pretraining teaches a language model to predict text. It does not directly teach the model to follow user instructions, answer safely, maintain dialogue structure, or format outputs in a useful way.

A pretrained model may continue text well but still behave poorly in interactive settings. For example, it may ignore instructions, generate irrelevant continuations, produce unsafe content, or imitate undesirable patterns from the training corpus.

Instruction tuning adapts a pretrained language model into a system that responds to tasks expressed in natural language.

The core idea is simple: instead of training on generic text continuation, we train the model on pairs of instructions and desired responses.

A typical example looks like:

Instruction Response
“Translate this sentence into French.” Correct French translation
“Summarize the following article.” Summary
“Write a Python function for binary search.” Python code
“Explain gradient descent.” Educational explanation

Instruction tuning changes the model’s behavior from generic next-token continuation toward task-oriented response generation.

From Language Modeling to Task Following

A pretrained autoregressive model learns

$$ p_\theta(x_t \mid x_{<t}). $$

The model predicts the next token from previous tokens. During pretraining, the corpus may contain instructions, answers, conversations, code, essays, and many unrelated text types mixed together.

Instruction tuning reorganizes the training distribution. Instead of arbitrary web text, the model receives structured examples:

$$ (\text{instruction}, \text{response}). $$

The model then learns the conditional distribution

$$ p_\theta(\text{response} \mid \text{instruction}). $$

This appears superficially similar to pretraining, since the model still predicts tokens autoregressively. The difference is the structure of the data distribution.

Pretraining teaches language structure broadly. Instruction tuning teaches cooperative task behavior.

Supervised Fine-Tuning

Instruction tuning is usually implemented as supervised fine-tuning, often abbreviated SFT.

The dataset contains demonstrations written by humans, synthetic systems, or mixtures of both. Each example includes:

Field Purpose
System prompt Defines global behavior
User prompt Contains the instruction
Assistant response Desired output

A training sample may look like:

<system>
You are a helpful assistant.

<user>
Explain backpropagation in simple terms.

<assistant>
Backpropagation computes gradients by applying the chain rule ...

The model is trained to predict the assistant tokens conditioned on all previous tokens.

The supervised loss is standard cross-entropy:

$$ \mathcal{L} = -\sum_{t} \log p_\theta(y_t \mid x, y_{<t}), $$

where:

Symbol Meaning
$x$ Prompt or instruction
$y_t$ Target response token
$y_{<t}$ Previous response tokens

Only assistant tokens usually contribute to the loss. User and system tokens provide conditioning context but are not prediction targets.

Why Instruction Tuning Works

Instruction tuning works because pretrained models already contain broad latent capabilities. Pretraining exposes the model to many tasks indirectly through text. The model may already contain useful representations for translation, reasoning, summarization, coding, and dialogue.

Instruction tuning teaches the model when and how to use those capabilities.

This is often described as eliciting latent knowledge rather than creating entirely new knowledge.

The model learns patterns such as:

Behavior Example
Obeying instructions Following formatting requests
Maintaining dialogue roles Responding as assistant rather than continuing user text
Producing concise answers Avoiding irrelevant continuation
Refusing unsafe requests Safety alignment
Using chain-of-thought style reasoning Stepwise solutions
Formatting outputs Markdown, JSON, code blocks

A relatively small instruction dataset can significantly change model behavior because the pretrained model already contains strong language representations.

Prompt Formatting and Chat Templates

Modern instruction-tuned models usually rely on structured prompt templates.

A dialogue is converted into a token sequence with role markers:

<system>
You are a concise assistant.

<user>
What is overfitting?

<assistant>

The model generates the assistant continuation.

Different model families use different formatting conventions:

Model family Example format
ChatML-style <system>, <user>, <assistant>
Instruction-style ### Instruction:
Llama-style chat [INST] ... [/INST]
XML-style <instruction> tags
JSON-style Structured objects

The formatting matters because the model learns statistical associations between role markers and behavior.

Changing the template can affect performance substantially.

Multi-Task Instruction Tuning

Instruction datasets often combine many tasks:

Task type Example
Question answering Factual responses
Summarization Compress documents
Translation Convert languages
Coding Generate programs
Classification Assign labels
Dialogue Multi-turn interaction
Reasoning Solve structured problems
Tool use Call APIs or functions

The model learns a unified interface: natural language instructions.

Instead of separate models for each task, one instruction-tuned model learns many conditional behaviors.

This unification is one reason large language models are flexible. The instruction itself acts as part of the program specification.

Zero-Shot and Few-Shot Generalization

Instruction tuning improves zero-shot generalization. A zero-shot task is one where the model receives only the instruction, without examples.

Example:

Classify this review as positive or negative:
"The battery life is excellent."

The model may perform the task correctly even without task-specific training examples in the prompt.

Few-shot prompting provides demonstrations inside the prompt itself:

Input: "Amazing product."
Label: Positive

Input: "Very disappointing."
Label: Negative

Input: "Battery life is excellent."
Label:

Instruction tuning improves the model’s ability to interpret such prompts consistently.

Pretraining alone may give weak task following. Instruction tuning calibrates the model toward cooperative interaction.

Chain-of-Thought Supervision

Some instruction datasets include intermediate reasoning steps rather than only final answers.

Example:

Question: If a train travels 60 km in 2 hours, what is its average speed?

Reasoning:
Speed = distance / time
= 60 / 2
= 30 km/h

Answer: 30 km/h

Training on reasoning traces can improve performance on multi-step reasoning tasks.

The model learns statistical patterns associated with decomposition, intermediate computation, verification, and explanation.

This is called chain-of-thought supervision.

However, chain-of-thought introduces several concerns:

Concern Description
Verbosity Longer outputs increase cost
Faithfulness Reasoning text may not reflect internal computation
Data contamination Public reasoning datasets may leak benchmarks
Safety Hidden reasoning may expose unsafe internal content

Some systems therefore separate visible reasoning from internal latent reasoning.

Instruction Diversity

An instruction-tuned model must generalize across many instruction styles.

If the dataset is too narrow, the model may overfit to specific phrasing. High-quality instruction tuning datasets therefore vary:

Variation Example
Wording “Summarize” versus “Give a short overview”
Tone Formal versus conversational
Format JSON, markdown, prose
Difficulty Simple and complex tasks
Domain Science, law, code, dialogue
Language Multilingual prompts

Diversity improves robustness.

The model learns abstract task semantics rather than memorizing exact templates.

Synthetic Instruction Data

Human-written instruction datasets are expensive. Many modern systems therefore generate synthetic instruction data.

A strong model can generate:

Synthetic component Example
Instructions “Write a SQL query for…”
Responses High-quality completions
Reasoning traces Stepwise derivations
Critiques Error analysis
Preference labels Ranking candidate answers

Synthetic data generation creates a recursive training loop:

  1. Train a strong model.
  2. Use the model to generate instruction data.
  3. Filter or rank the outputs.
  4. Train a new model on the expanded dataset.

This process scales data generation beyond purely human annotation.

However, synthetic data can amplify errors, stylistic artifacts, and model biases. Filtering and evaluation become increasingly important.

Catastrophic Forgetting

Instruction tuning changes the model distribution. If done poorly, it can damage capabilities learned during pretraining.

This is called catastrophic forgetting.

Possible symptoms include:

Problem Example
Reduced factual recall Worse knowledge retrieval
Lower language diversity Repetitive responses
Reduced multilingual ability Strong English bias
Style collapse Overly uniform outputs
Short-answer bias Failure on long reasoning tasks

Instruction tuning datasets are much smaller than pretraining corpora. Aggressive fine-tuning can therefore distort the pretrained representation space.

Several techniques reduce forgetting:

Technique Purpose
Small learning rates Preserve pretrained features
Mixed training data Blend instruction and pretraining text
Parameter-efficient tuning Update fewer parameters
Regularization Prevent large parameter drift
Replay buffers Reintroduce older data

Balancing specialization and preservation is a major practical challenge.

Parameter-Efficient Instruction Tuning

Full fine-tuning updates all parameters. For large models, this is expensive.

Parameter-efficient fine-tuning updates only small subsets of parameters.

Common approaches include:

Method Idea
LoRA Low-rank weight updates
Adapters Small trainable modules inserted into layers
Prefix tuning Train virtual prompt vectors
Prompt tuning Learn soft prompts
BitFit Train only bias terms

For example, LoRA approximates weight updates using low-rank matrices:

$$ \Delta W = AB, $$

where $A$ and $B$ have much smaller rank than $W$.

This greatly reduces memory and compute requirements while preserving much of the model’s performance.

Parameter-efficient tuning is widely used for domain adaptation and open-source fine-tuning.

Instruction Tuning and Alignment

Instruction tuning improves usability, but it does not fully solve alignment.

A model may still:

Failure mode Example
Hallucinate Invent facts
Follow harmful requests Unsafe outputs
Over-refuse Reject harmless queries
Manipulate users Social persuasion
Leak training data Memorized content
Produce biased responses Social stereotypes

Instruction tuning mainly teaches behavioral imitation from demonstrations.

More advanced alignment methods, such as reinforcement learning from human feedback, constitutional training, and preference optimization, further shape the model’s behavior.

PyTorch View of Supervised Fine-Tuning

Suppose a tokenized batch has shape:

[B, T]

where:

Symbol Meaning
B Batch size
T Sequence length

The model produces logits:

[B, T, V]

where $V$ is the vocabulary size.

Instruction tuning usually masks non-assistant tokens from the loss.

Example:

import torch
import torch.nn.functional as F

# input_ids: [B, T]
# labels: assistant tokens kept, others set to -100

logits = model(input_ids)

loss = F.cross_entropy(
    logits.view(-1, logits.size(-1)),
    labels.view(-1),
    ignore_index=-100
)

The label tensor may look like:

Input tokens:   [SYSTEM USER USER ASSISTANT ASSISTANT]
Loss mask:      [  -100  -100  -100     y1        y2 ]

Only assistant outputs contribute gradients.

Data Mixture and Curriculum

Instruction tuning datasets are often mixtures of many sources:

Source Example
Human annotation Expert-written prompts
Public QA datasets Reading comprehension
Code datasets Programming tasks
Synthetic conversations Generated dialogues
Tool traces API interaction examples
Reasoning datasets Math and logic problems

The mixture ratio matters.

Too much conversational data may weaken reasoning. Too much code data may distort natural language style. Too much synthetic data may create repetitive outputs.

Some training pipelines also use curricula:

  1. Easier tasks first.
  2. More complex reasoning later.
  3. Specialized tasks near the end.

Curriculum design can improve stability and convergence.

Why Instruction Tuning Changed Language Models

Early large language models were often difficult to control. Users needed carefully engineered prompts to obtain reliable behavior.

Instruction tuning changed the interaction model. Instead of treating the model as a generic text completer, users could treat it as a cooperative assistant.

This shift enabled:

Capability Impact
Conversational systems Multi-turn dialogue
General-purpose assistants Broad task coverage
Tool integration API and retrieval systems
Coding assistants Natural language programming
Educational tutors Explanatory interaction
Agent systems Planning and execution loops

Instruction tuning therefore transformed pretrained language models into usable interactive systems.

Summary

Instruction tuning adapts pretrained language models for task-following behavior using supervised examples of instructions and desired responses.

The model learns conditional generation:

$$ p_\theta(\text{response} \mid \text{instruction}). $$

Instruction tuning improves usability, formatting, dialogue structure, reasoning style, and zero-shot task generalization.

Modern instruction-tuned systems rely on structured prompts, diverse datasets, chain-of-thought supervision, synthetic data generation, and parameter-efficient adaptation methods.

Instruction tuning greatly improves interaction quality, but it does not fully solve factuality, robustness, or safety. Later alignment stages further shape model behavior beyond supervised imitation.