AI Weekly: AI model training costs on the rise, highlighting need for new solutions – VentureBeat

Posted: October 15, 2021 at 9:05 pm

This week, Microsoft and Nvidia announced that they trained what they claim is one of the largest and most capable AI language models to date: Megatron-Turing Natural Language Generation (MT-NLP). MT-NLP contains 530 billion parameters the parts of the model learned from historical data and achieves leading accuracy in a broad set of tasks, including reading comprehension and natural language inferences.

But building it didnt come cheap. Training took place across 560 Nvidia DGX A100 servers, each containing 8 Nvidia A100 80GB GPUs. Experts peg the cost in the millions of dollars.

Like other large AI systems, MT-NLP raises questions about the accessibility of cutting-edge research approaches in machine learning. AI training costs dropped 100-fold between 2017 and 2019, but the totals still exceed the compute budgets of most startups, governments, nonprofits, and colleges. The inequity favors corporations and world superpowers with extraordinary access to resources at the expense of smaller players, cementing incumbent advantages.

For example, in early October, researchers at Alibaba detailed M6-10T, a language model containing 10 trillion parameters (roughly 57 times the size of OpenAIs GPT-3) trained across 512 Nvidia V100 GPUs for 10 days. The cheapest V100 plan available through Google Cloud Platform costs $2.28 per hour, which would equate to over $300,000 ($2.28 per hour multiplied by 24 hours over 10 days) further than most research teams can stretch.

Google subsidiary DeepMind is estimated to have spent $35 million training a system to learn the Chinese board game Go. And when the companys researchers designed a model to play StarCraft II, they purposefully didnt try multiple ways of architecting a key component because the training cost would have been too high. Similarly, OpenAI didnt fix a mistake when it implemented GPT-3 because the cost of training made retraining the model infeasible.

Its important to keep in mind that training costs can be inflated by factors other than an algorithms technical aspects. As Yoav Shoham, Stanford University professor emeritus and cofounder of AI startup AI21 Labs, recently told Synced, personal and organizational considerations often contribute to a models final price tag.

[A] researcher might be impatient to wait three weeks to do a thorough analysis and their organization may not be able or wish to pay for it, he said. So for the same task, one could spend $100,000 or $1 million.

Still, the increasing cost of training and storing algorithms like Huaweis PanGu-Alpha, Navers HyperCLOVA, and the Beijing Academy of Artificial Intelligences Wu Dao 2.0 is giving rise to a cottage industry of startups aiming to optimize models without degrading accuracy. This week, former Intel exec Naveen Rao launched a new company, Mosaic ML, to offer tools, services, and training methods that improve AI system accuracy while lowering costs and saving time. Mosaic ML which has raised $37 million in venture capital competes with Codeplay Software, OctoML, Neural Magic, Deci, CoCoPie, and NeuReality in a market thats expected to grow exponentially in the coming years.

In a sliver of good news, the cost of basic machine learning operations has been falling over the past few years. A 2020 OpenAI survey found that since 2012, the amount of compute needed to train a model to the same performance on classifying images in a popular benchmark ImageNet has been decreasing by a factor of two every 16 months.

Approaches like network pruning prior to training could lead to further gains. Research has shown that parameters pruned after training, a process that decreases the model size, could have been pruned before training without any effect on the networks ability to learn. Called the lottery ticket hypothesis, the idea is that the initial values parameters in a model receive are crucial for determining whether theyre important. Parameters kept after pruning receive lucky initial values; the network can train successfully with only those parameters present.

Network pruning is far from a solved science, however. New ways of pruning that work before or in early training will have to be developed, as most current methods apply only retroactively. And when parameters are pruned, the resulting structures arent always a fit for the training hardware (e.g., GPUs), meaning that pruning 90% of parameters wont necessarily reduce the cost of training a model by 90%.

Whether through pruning, novel AI accelerator hardware, or techniques like meta-learning and neural architecture search, the need for alternatives to unattainably large models is quickly becoming clear. A University of Massachusetts Amherst study showed that using 2019-era approaches, training an image recognition model with a 5% error rate would cost $100 billion and produce as much carbon emissions as New York City does in a month. As IEEE Spectrums editorial team wrote in a recent piece, we must either adapt how we do deep learning or face a future of much slower progress.

For AI coverage, send news tips toKyle Wiggers and be sure to subscribe to the AI Weekly newsletterand bookmark our AI channel,The Machine.

Thanks for reading,

Kyle Wiggers

AI Staff Writer

Read the original here:

AI Weekly: AI model training costs on the rise, highlighting need for new solutions - VentureBeat

Related Posts