Installing and Configuring PyTorch

A PyTorch installation must match three things: the Python environment, the operating system, and the available hardware. A CPU-only installation is enough for small examples and early chapters. GPU support becomes important once models, datasets, and training loops grow.

The goal of installation is not only to make import torch work. The goal is to produce a reproducible environment where code runs consistently, dependencies are isolated, and hardware acceleration is available when needed.

Choosing an Environment

Use a virtual environment for each project. This prevents one project’s dependencies from breaking another project.

Common choices are:

Tool	Typical use
`venv`	Simple Python standard-library environments
Conda	Python plus native libraries and CUDA packages
`uv`	Fast Python package management
Docker	Reproducible system-level environments
Cloud notebooks	Temporary managed environments

For local learning, venv, Conda, or uv is enough. For production or shared research systems, Docker often gives better reproducibility.

A simple venv environment:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip

On Windows PowerShell:

python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip

Installing CPU PyTorch

A CPU-only installation works on most machines:

pip install torch torchvision torchaudio

This is sufficient for small tensor examples, automatic differentiation, linear models, and small neural networks.

After installation, check that PyTorch imports correctly:

import torch

print(torch.__version__)
print(torch.cuda.is_available())

For a CPU-only installation, torch.cuda.is_available() should return False.

Installing GPU PyTorch

For NVIDIA GPUs, PyTorch uses CUDA. The PyTorch package must be built for a compatible CUDA runtime.

A typical CUDA installation with pip looks like this:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

The exact CUDA index may differ by PyTorch release and system. Use the official PyTorch installation selector when preparing a real machine.

After installation:

import torch

print(torch.__version__)
print(torch.cuda.is_available())

if torch.cuda.is_available():
    print(torch.cuda.get_device_name(0))

A successful GPU installation should print True for CUDA availability and show the GPU name.

Apple Silicon

On Apple Silicon, PyTorch can use the Metal Performance Shaders backend, called mps.

Check for it with:

import torch

print(torch.backends.mps.is_available())

A simple device selector:

device = (
    "cuda" if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available()
    else "cpu"
)

print(device)

The mps backend is useful for local experimentation. Some operations may have different performance or support characteristics compared with CUDA.

Verifying Tensor Operations

After installation, run a small tensor test:

import torch

x = torch.randn(4, 3)
w = torch.randn(3, 2)

y = x @ w

print(y)
print(y.shape)

Then test gradients:

x = torch.tensor(2.0, requires_grad=True)

y = x * x + 3 * x + 1
y.backward()

print(x.grad)

The gradient should be tensor(7.).

If a GPU is available, test device execution:

device = "cuda" if torch.cuda.is_available() else "cpu"

x = torch.randn(1024, 1024, device=device)
y = x @ x

print(y.device)

This confirms that tensor operations run on the selected device.

Installing Common Libraries

Most PyTorch projects need more than the core package.

A practical learning environment:

pip install torch torchvision torchaudio
pip install numpy pandas matplotlib scikit-learn tqdm

For transformer and NLP work:

pip install transformers datasets tokenizers accelerate

For experiment tracking and configuration:

pip install tensorboard pyyaml rich

For notebooks:

pip install jupyter ipykernel

For graph neural networks, installation depends on the PyTorch and CUDA version. PyTorch Geometric should be installed using its official instructions because it may require version-specific wheels.

Reproducibility

A reproducible project records its dependencies.

For a small pip project:

pip freeze > requirements.txt

Install later with:

pip install -r requirements.txt

A more controlled project can use pyproject.toml:

[project]
name = "deep-learning-pytorch"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
    "torch",
    "torchvision",
    "torchaudio",
    "numpy",
    "matplotlib",
    "scikit-learn",
    "tqdm",
]

Dependency files matter because deep learning libraries change. A training script that works with one version may behave differently with another version.

Random Seeds

Deep learning uses random numbers for initialization, shuffling, dropout, augmentation, and sampling. Set random seeds when you need repeatable runs.

import random
import numpy as np
import torch

seed = 1234

random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)

if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

For stricter determinism:

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

This can reduce performance. Full determinism is not always possible across all operations, devices, and library versions.

Project Layout

A small PyTorch project can start with a few files:

project/
  train.py
  model.py
  data.py
  eval.py
  requirements.txt
  README.md

A larger project benefits from a package layout:

project/
  pyproject.toml
  src/
    dlbook/
      __init__.py
      data.py
      models.py
      train.py
      eval.py
  scripts/
    train_mnist.py
  configs/
    mnist.yaml
  checkpoints/
  runs/
  tests/

Keep generated files such as checkpoints and logs out of source control unless there is a specific reason to track them.

A typical .gitignore includes:

.venv/
__pycache__/
*.pyc
checkpoints/
runs/
data/

Device Configuration in Code

Most examples in this book use a device variable:

import torch

device = (
    "cuda" if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available()
    else "cpu"
)

Move the model to the device:

model = model.to(device)

Move batches inside the training loop:

for x, y in loader:
    x = x.to(device)
    y = y.to(device)

    logits = model(x)

A common error is creating new tensors on the CPU inside the model while the input is on the GPU.

Poor pattern:

bias = torch.zeros(x.shape[-1])

Better pattern:

bias = torch.zeros(x.shape[-1], device=x.device, dtype=x.dtype)

New tensors created during model computation should usually inherit device and dtype from existing tensors.

Mixed Precision Configuration

For CUDA training, mixed precision can improve speed and reduce memory usage.

A common pattern:

scaler = torch.amp.GradScaler("cuda")

for x, y in loader:
    x = x.to("cuda")
    y = y.to("cuda")

    optimizer.zero_grad(set_to_none=True)

    with torch.amp.autocast("cuda"):
        logits = model(x)
        loss = loss_fn(logits, y)

    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

Mixed precision should be introduced after a full-precision version works. It can change numerical behavior and make debugging harder.

Basic Sanity Check Script

A useful installation check is a complete tiny training script.

import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset

device = (
    "cuda" if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available()
    else "cpu"
)

x = torch.randn(1024, 20)
y = (x.sum(dim=1) > 0).long()

loader = DataLoader(TensorDataset(x, y), batch_size=64, shuffle=True)

model = nn.Sequential(
    nn.Linear(20, 64),
    nn.ReLU(),
    nn.Linear(64, 2),
).to(device)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)

for epoch in range(5):
    total_loss = 0.0

    for xb, yb in loader:
        xb = xb.to(device)
        yb = yb.to(device)

        logits = model(xb)
        loss = loss_fn(logits, yb)

        optimizer.zero_grad(set_to_none=True)
        loss.backward()
        optimizer.step()

        total_loss += loss.item() * xb.size(0)

    avg_loss = total_loss / len(loader.dataset)
    print(f"epoch={epoch} loss={avg_loss:.4f}")

The loss should generally decrease. This verifies tensors, modules, data loading, autograd, optimization, and device placement.

Common Installation Problems

Symptom	Likely cause	Fix
`ModuleNotFoundError: torch`	PyTorch installed in different environment	Activate the correct environment
`torch.cuda.is_available()` is `False`	CPU build or driver mismatch	Install CUDA-compatible PyTorch
Device mismatch error	Model and data on different devices	Move both to the same device
Out-of-memory error	Batch or model too large	Reduce batch size
Import error for vision or audio	Version mismatch	Install matching package versions
Very slow training	Running on CPU unexpectedly	Print device and tensor locations

Most setup problems are environment problems. Always print the Python path, PyTorch version, and device status when debugging.

import sys
import torch

print(sys.executable)
print(torch.__version__)
print(torch.cuda.is_available())

Working Style

A reliable workflow is:

Start with a clean environment.
Install PyTorch and core dependencies.
Verify tensor operations.
Verify gradients.
Verify device execution.
Run a tiny training script.
Add project-specific libraries.
Freeze or record dependencies.

This avoids debugging several layers of failure at once.

Summary

Installing PyTorch means configuring Python, dependencies, and hardware support together. A CPU setup is enough for small examples. A GPU setup requires compatible PyTorch packages, drivers, and device placement.

A good project records its dependencies, isolates its environment, tests gradients, verifies device execution, and uses a consistent project layout. These habits reduce friction before the real work begins: building and training models.