Google and Nvidia Take Cloud AI Performance to the Next Level – Data Center Knowledge

Google Cloud and Nvidia said Tuesday that Google would be the first cloud provider to offer Nvidias latest GPU for machine learning as a service.

The Nvidia A100 Tensor Core GPU, based on the chipmakers new Ampere architecture, represents the largest intergenerational leap in performance in Nvidias history. The company said the part performed 20 times better than its previous-gen product.

Related: How Google Cloud Plans to Win Enterprises from AWS and Azure

Another way Ampere is different from its predecessors is that its designed for both training and inference machine learning workloads. Nvidia designed a different GPU for each of the two types of workload in prior generations.

And clients can now kick the tires on Ampere in Google Cloud, as part of a new type of cloud instance the provider also announced Tuesday: Accelerator-Optimized VM, or A2.

Related: MLPerf Is Changing the AI Hardware Performance Conversation. Heres how

The beefiest configuration of Google'sA2 cloud instance comes with 16 Ampere GPUs, all interconnected by NVSwitch, Nvidias technology for interconnecting many GPUs to form a single computing fabric. Thats 640GB of GPU memory, 1.3TB of system memory, and 9.6TB/s of aggregate bandwidth.

Smaller A2 configurations are available as well.

For now, A100 instances are only available in alpha. Google Cloud was also first to launch Nvidias older T4 GPUs, in November 2018, also in alpha. T4 beta came about three months later, and general availability was announced after four more months.

Google Cloud may be first to offer Ampere GPUs as a service, but Nvidia had delivered the chips to all the major cloud providers as of mid-May, when it announced the chip publicly, the chipmakers CEO, Jensen Huang, told Bloomberg at the time. The others (AWS, Azure, Alibaba, Oracle, IBM) will likely roll out their own Ampere cloud infrastructure soon.

Google Cloud was also first to roll out cloud instances powered by AMDs Epyc 2 chips, which beat comparable Intel parts on both performance and price. The instances, according to Google, would be the most powerful VMs available to Google Cloud users.

Epyc 2 is also the CPU in Nvidias own Ampere-based supercomputer for machine learning, the DGX A100, which it announced along with A100 GPUs in May.

Read the rest here:
Google and Nvidia Take Cloud AI Performance to the Next Level - Data Center Knowledge

Related Posts
This entry was posted in $1$s. Bookmark the permalink.