Windows Server 2016 will let you get full access to the GPU inside a virtual machine, using a hardware pass-through setting called Direct Device Assignment, and because Azure runs on Windows Server, you’ll get it there as well.
“We leverage this technology primarily for GPUs and also for things like NVMe storage,” explained Microsoft’s Chris Huybregts at the Nvidia GTC conference. You can only use the GPU with one virtual machine, but that virtual machine gets access to all the features of the GPU, using the standard graphics driver (rather than the virtual GPU driver that Microsoft supplies for RemoteFX GPU virtualisation).
Direct Device Assignment is how the new N-Series VMs on Azure get their GPUs. Designed for running applications that need high-performance graphics or that use the GPU for high-performance parallel computing, they were announced last September and are currently in preview.
When they launch there will be two ranges of N-Series VMs, both with a choice of 6, 12 or 24 Xeon E5 CPU cores and one, two or four GPUs. The NV series uses Nvidia Tesla M60 GPUs and is designed for running visualisation and rendering software, while the NC series uses Nvidia K80 GPUs and is for GPU computing.
“That means the entire GPU will be available in the virtual machine; that includes CUDA, OpenGL, OpenCL and DirectX,” Huybregts said. You can run Windows Server or Windows 10in the VMs, or Linux. “We understand that the world needs to know they can run on Linux – Linux is a first-class citizen for Azure,” he promised.
There will be virtual machine images in the Azure Marketplace that are set up with applications ready to use (similar to the Azure Data Science virtual machines that bundle up useful tools for data science modelling like R Server and Python, for both Windows Serverand Linux), or you can upload your own image, including the OS and applications you need.
Huybregts wouldn’t give any details on how Microsoft will use Grid, Nvidia’s own graphics virtualisation technology – something Microsoft has mentioned previously for the N-Series VMs – but he did confirm there’s work going on. “If you could see where we’re going, you’d see we are working with Nvidia and the industry in general – but we’re not talking about that today.”
(Nvidia Grid is what AWS offers in its G2 GPU-compute VMs, which use older Nvidia K520 graphics cards and use the Grid K520 drivers, which have to be loaded specifically, making them a little harder to set up.)
He also wouldn’t talk about when the N-Series VMs will come out of preview on Azure, or how much they’re likely to cost. They’re unlikely to be available before Windows Server 2016 is released, which is expected around September 2016.
For comparison, running the Data Science VMs on Azure costs from $0.67 to $9.95 an hour if you use the Xeon E5-based G-Series virtual machines, depending on how many cores you need. Prices for the A-Series VMs with Xeon E5 and high-performance InfiniBand networking start at $1.46 an hour – and AWS G2 virtual machines cost between $0.76 and $2.87 an hour.
Expect N-Series prices to be closer to these ranges than the $0.02 an hour you’ll pay for the cheapest A-Series virtual machines on Azure.
What you can do with a GPU
Microsoft will be using the new GPU VMs itself, especially for workloads that rely on deep neural networks and machine learning. “The internal image recognition that’s done by Bing, Skype Translator; they will all be done on Azure in the future,” said Karan Batta of the Azure team. That will take advantage of the strength of CNTK – this open source deep computing toolkit is Microsoft’s equivalent of Google TensorFlow and it can scale to multiple GPU systems for better performance.
CNTK was used to build the CaptionBot service that provides a caption describing what’s happening in an image. At the moment that runs using standard virtual machines on Azure, but it will get better performance when it’s moved over to the new GPU VMs.
Running CNTK on a virtual machine with four GPUs, Batta was able to process 10,000 samples a second using a single GPU, 19,000 samples a second with two GPUs and 35,000 samples a second with four GPUs, showing that the toolkit can keep taking advantage of more hardware.
And the big data you want to use for machine learning is often stored in the cloud already. “We’re trying to close the loop on data. You want to work with data where it is and GPUs help with that,” Batta explained.
All manner of applications
But Batta also predicts that GPU computing will be useful for a lot of high-performance computing tasks in finance and manufacturing as well as in traditional graphics areas like media and rendering. “You could make a Netflix-like offering using public cloud. We’re able to do rendering with V-Ray, Arnold, all these ray tracers, in the cloud; you can spin up the number of GPUs you need to render your frames and then spin them back down again,” he said.
Batta added: “Taking the GPU and directly passing it through to the guest allows us to give you good performance, because we’re essentially giving all the GPU performance to the guest. That allows close to bare metal performance that these workloads can really take advantage of.”
With all that power, you will need to allow some time for N-Series VMs to start up – it will take five to ten minutes for them to spin up and load the operating system and your applications.