Lutz RoederDec 27, 2016 · Updated May 20, 2025

PyTorch on Azure

Training neural networks (deep learning) is compute-intensive. Fast GPUs can make those sessions, which sometimes take hours or days, go orders of magnitude faster. Most laptops lack fast GPUs, and maintaining a desktop solely for deep learning tasks adds overhead.

Cloud providers now offer virtual machines (VMs) with GPUs which run in data centers and can be used on an hourly basis. Below is a quick tutorial that walks through setting up a VM in Microsoft Azure with the necessary drivers to train neural networks using PyTorch.

First, if you haven't done so already, create an Azure account, install the Azure CLI and follow the login procedure running az login.

Find a region that has VM types with NVIDIA A100 GPUs available:

az vm list-skus --size "Standard_NC40ads_H100_v5" --query '[].locationInfo[].location' -o tsv

Azure manages resources (virtual machines, storage etc.) via resource groups. Create a resource group in the selected region:

az group create --name rg-cuda --location westus2

Create a key pair to connect to the machine via SSH:

ssh-keygen -f ~/.ssh/azure_cuda_id_rsa -t rsa -b 2048 -C '' -N ''

Next, create the virtual machine using an Ubuntu image:

az vm create --resource-group rg-cuda --name cuda-001 --image Canonical:ubuntu-24_04-lts:server:latest --size Standard_NC40ads_H100_v5 --admin-username cuda --ssh-key-value ~/.ssh/azure_cuda_id_rsa.pub

Once completed, the IP address for the newly created machine will be shown:

{
  "publicIpAddress": "127.0.0.1",
  "resourceGroup": "rg-cuda"
}

The VM is now running in a data center (and charging for cycles). The following commands can be used to deallocate and restart anytime:

az vm deallocate --resource-group rg-cuda --name cuda-001
az vm start --resource-group rg-cuda --name cuda-001

Connect to the machine via SSH (type 'yes', if asked to continue):

ssh cuda@$(az vm show --show-details --resource-group rg-cuda --name cuda-001 --query "publicIps" --output tsv) -i ~/.ssh/azure_cuda_id_rsa

Install CUDA

Start by installing the packaged NVIDIA drivers for Ubuntu:

sudo apt update
sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers install

Next, install the CUDA Toolkit:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo apt install -y ./cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install -y cuda-toolkit-12-5
rm cuda-keyring_1.1-1_all.deb

Reboot with the drivers installed, then connect using SSH:

sudo reboot

Check the status of the GPU(s) by running:

nvidia-smi

Install PyTorch

The final step is to install Python and PyTorch:

sudo apt install -y python3-dev python3-pip python3-venv python-is-python3
python -m venv .venv
source .venv/bin/activate
pip install numpy torch

Launch a Python console and create a PyTorch session:

python
>>> import torch
>>> torch.cuda.is_available()
>>> print(torch.cuda.get_device_name(0))

If everything went well, it will recognize the GPU and print its name.

Remember to deallocate the VM when done to avoid using cycles:

az vm deallocate --resource-group rg-cuda --name cuda-001

Once no longer needed, you can delete the virtual machine by running:

az vm delete --resource-group rg-cuda --name cuda-001
az group delete --name rg-cuda