PyTorch on Azure
Training neural networks (deep learning) is compute-intensive. Fast GPUs can make those sessions, which sometimes take hours or days, go orders of magnitude faster. Most laptops lack fast GPUs, and maintaining a desktop solely for deep learning tasks adds overhead.
Cloud providers now offer virtual machines (VMs) with GPUs which run in data centers and can be used on an hourly basis. Below is a quick tutorial that walks through setting up a VM in Microsoft Azure with the necessary drivers to train neural networks using PyTorch.
First, if you haven't done so already, create an Azure account, install the Azure CLI
and follow the login procedure running az login
.
Find a region that has VM types with NVIDIA A100 GPUs available:
az vm list-skus --size "Standard_NC40ads_H100_v5" --query '[].locationInfo[].location' -o tsv
Azure manages resources (virtual machines, storage etc.) via resource groups. Create a resource group in the selected region:
az group create --name rg-cuda --location westus2
Create a key pair to connect to the machine via SSH:
ssh-keygen -f ~/.ssh/azure_cuda_id_rsa -t rsa -b 2048 -C '' -N ''
Next, create the virtual machine using an Ubuntu image:
az vm create --resource-group rg-cuda --name cuda-001 --image Canonical:ubuntu-24_04-lts:server:latest --size Standard_NC40ads_H100_v5 --admin-username cuda --ssh-key-value ~/.ssh/azure_cuda_id_rsa.pub
Once completed, the IP address for the newly created machine will be shown:
{ "publicIpAddress": "127.0.0.1", "resourceGroup": "rg-cuda" }
The VM is now running in a data center (and charging for cycles). The following commands can be used to deallocate and restart anytime:
az vm deallocate --resource-group rg-cuda --name cuda-001 az vm start --resource-group rg-cuda --name cuda-001
Connect to the machine via SSH (type 'yes', if asked to continue):
ssh cuda@$(az vm show --show-details --resource-group rg-cuda --name cuda-001 --query "publicIps" --output tsv) -i ~/.ssh/azure_cuda_id_rsa
Install CUDA
Start by installing the packaged NVIDIA drivers for Ubuntu:
sudo apt update sudo apt install -y ubuntu-drivers-common sudo ubuntu-drivers install
Next, install the CUDA Toolkit:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb sudo apt install -y ./cuda-keyring_1.1-1_all.deb sudo apt update sudo apt install -y cuda-toolkit-12-5 rm cuda-keyring_1.1-1_all.deb
Reboot with the drivers installed, then connect using SSH:
sudo reboot
Check the status of the GPU(s) by running:
nvidia-smi
Install PyTorch
The final step is to install Python and PyTorch:
sudo apt install -y python3-dev python3-pip python3-venv python-is-python3 python -m venv .venv source .venv/bin/activate pip install numpy torch
Launch a Python console and create a PyTorch session:
python >>> import torch >>> torch.cuda.is_available() >>> print(torch.cuda.get_device_name(0))
If everything went well, it will recognize the GPU and print its name.
Remember to deallocate the VM when done to avoid using cycles:
az vm deallocate --resource-group rg-cuda --name cuda-001
Once no longer needed, you can delete the virtual machine by running:
az vm delete --resource-group rg-cuda --name cuda-001 az group delete --name rg-cuda