If somebody told you that a refurbished laptop could eclipse the performance of an NVIDIA A100 GPU when training a 200 million-parameter neural network, youd want to know the secret. Running AI routines on CPUs is supposed to be slow, which is why GPUs are in high demand, and NVIDIA shareholders are celebrating. But maybe its not that simple.
Part of the issue is that the development and availability of GPUs, which can massively parallelize matrix multiplications, has made it possible to brute force progress in AI. Bigger is better when it comes to both the amount of data used to train neural networks and the size of the models, reflected in the number of parameters.
Considering state-of-the-art large language models (LLMs) such as OpenAIs GPT-4, the number of parameters is now measured in the billions. And training what is, in effect, a vast, multi-layered equation by first specifying model weights at random and then refining those parameters through backpropagation and gradient descent is now firmly GPU territory.
Nobody runs high-performance AI routines on CPUs, or at least thats the majority view. The growth in model size, driven by the gains in accuracy, has led users to overwhelmingly favor much faster GPUs to carry out billions of calculations back and forth.
But the scale of the latest generative AI models is putting this brute force GPU approach to the test. And many developers no longer have the time, money, or computing resources to compete fine-tuning billions of artificial neurons that comprise the many-layered networks.
Experts in the field are asking if theres another, more efficient way of training neural networks to perform tasks such as image recognition, product recommendation, and natural language processing (NLP) search.
Artificial neural networks are compared to the workings of the human brain. But the comparison is a loose one as the human brain operates using the power of a dim light bulb, whereas state-of-the-art AI models require vast amounts of power, have worryingly large carbon footprints, and require large amounts of cooling.
That being said, the human brain consumes a considerable amount of energy compared with other organs in the body. But its orders of magnitude GPU-beating capabilities stem from the fact that the brains chemistry only recruits the neurons that it needs rather than having to perform calculations in bulk.
AI developers are trying to mimic those brain-like efficiencies in computing hardware by engineering architectures known as spiking neural networks. Neurons behave more like accumulators and fire only when repeatedly prompted. But its a work in progress.
However, its long been known that training AI algorithms could be made much more efficient. Matrix multiplications assume dense computations, but researchers have shown a decade ago that just picking the top ten percent of neuron activations will still produce high-quality results.
The issue is that to identify the top ten percent you would still have to run all of those sums in bulk, which would remain wasteful. But what if you could look up a list of those most active neurons based on a given input?
And its the answer to this question that opens up the path to running AI on CPUs, which is potentially game-changing as the observation that a refurbished laptop can eclipse the performance of an NVIDIA A100 GPU hints at.
So what is this magic? At the heart of the approach is the use of hash tables, which famously run in constant time (or thereabouts). In other words, searching for an entry in a hash table is independent of the number of locations. And Google puts this principle to work on its web search.
For example, if you type Best restaurants in London into Google Chrome, that query thanks to hashing, which turns the input into a unique fingerprint provides the index to a list of topical websites that Google has filed away at that location. And its why, despite having billions of websites stored in its vast index, Google can deliver search results to users in a matter of milliseconds.
And, just as your search query in effect provides a lookup address for Google, a similar approach can be used to identify which artificial neurons are most strongly associated with a piece of training data, such as a picture of a cat.
In neural networks, hash tables can be used to tell the algorithm which activations need to be calculated, dramatically reducing the computational burden to a fraction of brute force methods, which makes it possible to run AI on CPUs.
In fact, the class of hash functions that turn out to be most useful are dubbed locally sensitive hash (LSH) functions. Regular hash functions are great for fast memory addressing and duplicate detection, whereas locally sensitive hash functions provide near-duplicate detection.
LSH functions can be used to hash data points that are near to each other in other words, similar into the same buckets with high probability. And this, in terms of deep learning, dramatically improves the sampling performance during model training.
Hash functions can also be used to improve the user experience once models have been trained. And computer scientists based in the US at Rice University, Texas, Stanford University, California, and from the Pocket LLM pioneer ThirdAI, have proposed a method dubbed HALOS: Hashing Large Output Space for Cheap Inference, which speeds up the process without compromising model performance.
As the team explains, HALOS reduces inference into sub-linear computation by selectively activating only a small set of likely-to-be-relevant output layer neurons. Given a query vector, the computation can be focused on a tiny subset of the large database, write the authors in their conference paper. Our extensive evaluations show that HALOS matches or even outperforms the accuracy of given models with 21 speed up and 87% energy reduction.
Commercially, this approach is helping merchants such as Wayfair an online retailer that enables customers to find millions of products for their homes. Over the years, the firm has worked hard to improve its recommendation engine, noting a study by Amazon that even a 100-millisecond delay in serving results can put a noticeable dent in sales.
And, sticking briefly with online shopping habits, more recent findings published by Akamai report that over half of mobile website visitors will leave a page that takes more than three seconds to load food for thought as half of consumers are said to browse for products and services on their smartphones.
All of this puts pressure on claims that clever use of hash functions can enable AI to run on CPUs. But the approach more than lived up to expectations, as Wayfair has confirmed in a blog post. We were able to train our version three classifier model on commodity CPUs, while at the same time achieve a markedly lower latency rate, commented Weiyi Sun Associate Director of Machine Learning at the company.
Plus, as the computer scientists described in their study, the use of hash-based processing algorithms accelerated inference too.
Here is the original post:
Is running AI on CPUs making a comeback? - TechHQ
- Signal and noise: how timing measurements and AI are improving ... - ATLAS Experiment at CERN [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- Elon Musk Hints at Finalizing Tesla FSD V12 Code, Needs More ... - autoevolution [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- Research on key acoustic characteristics of soundscapes of the ... - Nature.com [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- Fast Simon Launches Vector Search With Advanced AI for ... - GlobeNewswire [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- The TALOS-AI4SSH project: Expanding research and innovation ... - Innovation News Network [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- Industry 4.0: The Transformation of Production - Fagen wasanni [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- ASU researchers bridge security and AI - Full Circle [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- Spatial attention-based residual network for human burn ... - Nature.com [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- AI's Transformative Impact on Industries - Fagen wasanni [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- Simulation analysis of visual perception model based on pulse ... - Nature.com [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- Tuning and Optimizing Your Neural Network | by Aye Kbra ... - DataDrivenInvestor [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- Portrait of intense communications within microfluidic neural ... - Nature.com [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- New Optical Neural Network Filters Info before Processing - RTInsights [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- The Future of Telecommunications: 3D Printing, Neural Networks ... - Fagen wasanni [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- Types of Neural Networks in Artificial Intelligence - Fagen wasanni [Last Updated On: August 4th, 2023] [Originally Added On: August 4th, 2023]
- The Evolution of Artificial Intelligence: From Turing to Neural Networks - Fagen wasanni [Last Updated On: August 6th, 2023] [Originally Added On: August 6th, 2023]
- Using Photonic Neurons to Improve Neural Networks - RTInsights [Last Updated On: August 6th, 2023] [Originally Added On: August 6th, 2023]
- Distributed constrained combinatorial optimization leveraging hypergraph neural networks - Nature.com [Last Updated On: June 6th, 2024] [Originally Added On: June 6th, 2024]
- Neurotechnology: auditory neural networks mimic the human brain - Hello Future Orange - Hello Future [Last Updated On: June 6th, 2024] [Originally Added On: June 6th, 2024]