A PyTorch installation must match three things: the Python environment, the operating system, and the available hardware.
A PyTorch installation must match three things: the Python environment, the operating system, and the available hardware. A CPU-only installation is enough for small examples and early chapters. GPU support becomes important once models, datasets, and training loops grow.
The goal of installation is not only to make import torch work. The goal is to produce a reproducible environment where code runs consistently, dependencies are isolated, and hardware acceleration is available when needed.
Choosing an Environment
Use a virtual environment for each project. This prevents one project’s dependencies from breaking another project.
Common choices are:
| Tool | Typical use |
|---|---|
venv | Simple Python standard-library environments |
| Conda | Python plus native libraries and CUDA packages |
uv | Fast Python package management |
| Docker | Reproducible system-level environments |
| Cloud notebooks | Temporary managed environments |
For local learning, venv, Conda, or uv is enough. For production or shared research systems, Docker often gives better reproducibility.
A simple venv environment:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pipOn Windows PowerShell:
python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pipInstalling CPU PyTorch
A CPU-only installation works on most machines:
pip install torch torchvision torchaudioThis is sufficient for small tensor examples, automatic differentiation, linear models, and small neural networks.
After installation, check that PyTorch imports correctly:
import torch
print(torch.__version__)
print(torch.cuda.is_available())For a CPU-only installation, torch.cuda.is_available() should return False.
Installing GPU PyTorch
For NVIDIA GPUs, PyTorch uses CUDA. The PyTorch package must be built for a compatible CUDA runtime.
A typical CUDA installation with pip looks like this:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121The exact CUDA index may differ by PyTorch release and system. Use the official PyTorch installation selector when preparing a real machine.
After installation:
import torch
print(torch.__version__)
print(torch.cuda.is_available())
if torch.cuda.is_available():
print(torch.cuda.get_device_name(0))A successful GPU installation should print True for CUDA availability and show the GPU name.
Apple Silicon
On Apple Silicon, PyTorch can use the Metal Performance Shaders backend, called mps.
Check for it with:
import torch
print(torch.backends.mps.is_available())A simple device selector:
device = (
"cuda" if torch.cuda.is_available()
else "mps" if torch.backends.mps.is_available()
else "cpu"
)
print(device)The mps backend is useful for local experimentation. Some operations may have different performance or support characteristics compared with CUDA.
Verifying Tensor Operations
After installation, run a small tensor test:
import torch
x = torch.randn(4, 3)
w = torch.randn(3, 2)
y = x @ w
print(y)
print(y.shape)Then test gradients:
x = torch.tensor(2.0, requires_grad=True)
y = x * x + 3 * x + 1
y.backward()
print(x.grad)The gradient should be tensor(7.).
If a GPU is available, test device execution:
device = "cuda" if torch.cuda.is_available() else "cpu"
x = torch.randn(1024, 1024, device=device)
y = x @ x
print(y.device)This confirms that tensor operations run on the selected device.
Installing Common Libraries
Most PyTorch projects need more than the core package.
A practical learning environment:
pip install torch torchvision torchaudio
pip install numpy pandas matplotlib scikit-learn tqdmFor transformer and NLP work:
pip install transformers datasets tokenizers accelerateFor experiment tracking and configuration:
pip install tensorboard pyyaml richFor notebooks:
pip install jupyter ipykernelFor graph neural networks, installation depends on the PyTorch and CUDA version. PyTorch Geometric should be installed using its official instructions because it may require version-specific wheels.
Reproducibility
A reproducible project records its dependencies.
For a small pip project:
pip freeze > requirements.txtInstall later with:
pip install -r requirements.txtA more controlled project can use pyproject.toml:
[project]
name = "deep-learning-pytorch"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
"torch",
"torchvision",
"torchaudio",
"numpy",
"matplotlib",
"scikit-learn",
"tqdm",
]Dependency files matter because deep learning libraries change. A training script that works with one version may behave differently with another version.
Random Seeds
Deep learning uses random numbers for initialization, shuffling, dropout, augmentation, and sampling. Set random seeds when you need repeatable runs.
import random
import numpy as np
import torch
seed = 1234
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)For stricter determinism:
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = FalseThis can reduce performance. Full determinism is not always possible across all operations, devices, and library versions.
Project Layout
A small PyTorch project can start with a few files:
project/
train.py
model.py
data.py
eval.py
requirements.txt
README.mdA larger project benefits from a package layout:
project/
pyproject.toml
src/
dlbook/
__init__.py
data.py
models.py
train.py
eval.py
scripts/
train_mnist.py
configs/
mnist.yaml
checkpoints/
runs/
tests/Keep generated files such as checkpoints and logs out of source control unless there is a specific reason to track them.
A typical .gitignore includes:
.venv/
__pycache__/
*.pyc
checkpoints/
runs/
data/Device Configuration in Code
Most examples in this book use a device variable:
import torch
device = (
"cuda" if torch.cuda.is_available()
else "mps" if torch.backends.mps.is_available()
else "cpu"
)Move the model to the device:
model = model.to(device)Move batches inside the training loop:
for x, y in loader:
x = x.to(device)
y = y.to(device)
logits = model(x)A common error is creating new tensors on the CPU inside the model while the input is on the GPU.
Poor pattern:
bias = torch.zeros(x.shape[-1])Better pattern:
bias = torch.zeros(x.shape[-1], device=x.device, dtype=x.dtype)New tensors created during model computation should usually inherit device and dtype from existing tensors.
Mixed Precision Configuration
For CUDA training, mixed precision can improve speed and reduce memory usage.
A common pattern:
scaler = torch.amp.GradScaler("cuda")
for x, y in loader:
x = x.to("cuda")
y = y.to("cuda")
optimizer.zero_grad(set_to_none=True)
with torch.amp.autocast("cuda"):
logits = model(x)
loss = loss_fn(logits, y)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()Mixed precision should be introduced after a full-precision version works. It can change numerical behavior and make debugging harder.
Basic Sanity Check Script
A useful installation check is a complete tiny training script.
import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset
device = (
"cuda" if torch.cuda.is_available()
else "mps" if torch.backends.mps.is_available()
else "cpu"
)
x = torch.randn(1024, 20)
y = (x.sum(dim=1) > 0).long()
loader = DataLoader(TensorDataset(x, y), batch_size=64, shuffle=True)
model = nn.Sequential(
nn.Linear(20, 64),
nn.ReLU(),
nn.Linear(64, 2),
).to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
for epoch in range(5):
total_loss = 0.0
for xb, yb in loader:
xb = xb.to(device)
yb = yb.to(device)
logits = model(xb)
loss = loss_fn(logits, yb)
optimizer.zero_grad(set_to_none=True)
loss.backward()
optimizer.step()
total_loss += loss.item() * xb.size(0)
avg_loss = total_loss / len(loader.dataset)
print(f"epoch={epoch} loss={avg_loss:.4f}")The loss should generally decrease. This verifies tensors, modules, data loading, autograd, optimization, and device placement.
Common Installation Problems
| Symptom | Likely cause | Fix |
|---|---|---|
ModuleNotFoundError: torch | PyTorch installed in different environment | Activate the correct environment |
torch.cuda.is_available() is False | CPU build or driver mismatch | Install CUDA-compatible PyTorch |
| Device mismatch error | Model and data on different devices | Move both to the same device |
| Out-of-memory error | Batch or model too large | Reduce batch size |
| Import error for vision or audio | Version mismatch | Install matching package versions |
| Very slow training | Running on CPU unexpectedly | Print device and tensor locations |
Most setup problems are environment problems. Always print the Python path, PyTorch version, and device status when debugging.
import sys
import torch
print(sys.executable)
print(torch.__version__)
print(torch.cuda.is_available())Working Style
A reliable workflow is:
- Start with a clean environment.
- Install PyTorch and core dependencies.
- Verify tensor operations.
- Verify gradients.
- Verify device execution.
- Run a tiny training script.
- Add project-specific libraries.
- Freeze or record dependencies.
This avoids debugging several layers of failure at once.
Summary
Installing PyTorch means configuring Python, dependencies, and hardware support together. A CPU setup is enough for small examples. A GPU setup requires compatible PyTorch packages, drivers, and device placement.
A good project records its dependencies, isolates its environment, tests gradients, verifies device execution, and uses a consistent project layout. These habits reduce friction before the real work begins: building and training models.