What sort of silicon brain do you need for artificial intelligence? – The Register

The Raspberry Pi is one of the most exciting developments in hobbyist computing today. Across the world, people are using it to automate beer making, open up the world of robotics and revolutionise STEM education in a world overrun by film students. These are all laudable pursuits. Meanwhile, what is Microsoft doing with it? Creating squirrel-hunting water robots.

Over at the firms Machine Learning and Optimization group, a researcher saw squirrels stealing flower bulbs and seeds from his bird feeder. The research team trained a computer vision model to detect squirrels, and then put it onto a Raspberry Pi3 board. Whenever an adventurous rodent happened by, it would turn on the sprinkler system.

Microsofts sciurine aversions arent the point of that story its shoehorning of a convolutional neural network onto an ARM CPU is. Itshows how organizations are pushing hardware further to support AI algorithms. AsAI continues to make the headlines, researchers are pushing its capabilities to make it increasingly competent at basic tasks such as recognizing vision and speech.

As people expect more of the technology, cramming it into self-flying drones and self-driving cars, the hardware challenges are increasing. Companies are producing custom silicon and computing nodes capable of handling them.

Jeff Orr, research director at analyst firm ABI Research, divides advances in AI hardware into three broad areas: cloud services, ondevice, and hybrid. The first focuses on AI processing done online in hyperscale data centre environments like Microsofts, Amazons and Googles.

At the other end of the spectrum, he sees more processing happening on devices in the field, where connectivity or latency prohibit sending data back to the cloud.

Its using maybe a voice input to allow for hands-free operation of a smartphone or a wearable product like smart glasses, he says. That will continue to grow. Theres just not a large number of real-world examples ondevice today. Heviews augmented reality as a key driver here. Ortheres always this app, we suppose.

Finally, hybrid efforts marry both platforms to complete AI computations. This is where your phone recognizes what youre asking it but asks cloud-based AI to answer it, for example.

The clouds importance stems from the way that AI learns. AImodels are increasingly moving to deep learning, which uses complex neural networks with many layers to create more accurate AI routines.

There are two aspects to using neural networks. The first is training, where the network analyses lots of data to produce a statistical model. This is effectively the learning phase. The second is inference, where the neural network then interprets new data to generate accurate results. Training these networks chews up vast amounts of computing power, but the training load can be split into many tasks that run concurrently. This is why GPUs, with their double floating point precision and huge core counts, are so good at it.

Nevertheless, neural networks are getting bigger and the challenges are getting greater. Ian Buck, vice president of the Accelerate Computing Group at dominant GPU vendor Nvidia, says that theyre doubling in size each year. The company is creating more computationally intense GPU architectures to cope, but it is also changing the way it handles its maths.

Itcan be done with some reduced precision, he says. Originally, neural network training all happened in 32bit floating point, but it has optimized its newer Volta architecture, announced in May, for 16bit inputs with 32bit internal mathematics.

Reducing the precision of the calculation to 16 bits has two benefits, according to Buck.

One is that you can take advantage of faster compute, because processors tend to have more throughput at lower resolution, he says. Cutting the precision also increases the amount of available bandwidth, because youre fetching smaller amounts of data for each computation.

The question is, how low can you go? asks Buck. Ifyou go too low, it wont train. Youll never achieve the accuracy you need for production, or it will become unstable.

While Nvidia refines its architecture, some cloud vendors have been creating their own chips using alternative architectures to GPUs. The first generation of Googles Tensor Processing Unit (TPU) originally focused on 8bit integers for inference workloads. The newer generation, announced in May, offers floating point precision and can be used for training, too. These chips are application-specific integrated circuits (ASICs). Unlike CPUs and GPUs, they are designed for a specific purpose (youll often see them used for mining bitcoins these days) and cannot be reprogrammed. Their lack of extraneous logic makes them extremely high in performance and economic in their power usage but very expensive.

Google's scale is large enough that it can swallow the high non-recurring expenditures (NREs) associated with designing the ASIC in the first place because of the cost savings it achieves in AIbased data centre operations. Ituses them across many operations, ranging from recognizing Street View text to performing Rankbrain search queries, and every time a TPU does something instead of a GPU, Google saves power.

Its going to save them a lot of money, said Karl Freund, senior analyst for high performance computing and deep learning at Moor Insights and Strategy.

He doesnt think thats entirely why Google did it, though. Ithink they did it so they would have complete control of the hardware and software stack. If Google is betting the farm on AI, then it makes sense to control it from endpoint applications such as self-driving cars through to software frameworks and the cloud.

When it isnt drowning squirrels, Microsoft is rolling out field programmable gate arrays (FPGAs) in its own data centre revamp. These are similar to ASICs but reprogrammable so that their algorithms can be updated. They handle networking tasks within Azure, but Microsoft has also unleashed them on AI workloads such as machine translation. Intel wants a part of the AI industry, wherever it happens to be running, and that includes the cloud. To date, its Xeon Phi high-performance CPUs have tackled general purpose machine learning, and the latest version, codenamed Knights Mill, ships this year.

The company also has a trio of accelerators for more specific AI tasks, though. For training deep learning neural networks, Intel is pinning its hopes on Lake Crest, which comes from its Nervana acquisition. This is a coprocessor that the firm says overcomes data transfer performance ceilings using a type of memory called HBM2, which is around 12times faster than DDR4.

While these big players jockey for position with systems built around GPUs, FPGAs and ASICs, others are attempting to rewrite AI architectures from the ground up.

Knuedge is reportedly prepping 256-core chips designed for cloud-based operations but isnt saying much.

UK-based Graphcore, due to release its technology in 2017, has said a little more. Itwants its Intelligence Processing Unit (IPU) to use graph-based processing rather than the vectors used by GPUs or the scalar processing in CPUs. The company hopes that this will enable it to fit the training and inference workloads onto a single processor. One interesting thing about its technology is that its graph-based processing is supposed to mitigate one of the biggest problems in AI processing getting data from memory to the processing unit. Dell has been the firms perennial backer.

Wave Computing is also focusing on a different kind of processing, using what it calls its data flow architecture. Ithas a training appliance designed for operation in the data centre that it says can hit 2.9 PetaOPs/sec.

Whereas cloud-based systems can handle neural network training and inference, Client-side devices from phones to drones focus mainly on the latter. Their considerations are energy efficiency and low-latency computation.

You cant rely on the cloud for your car to drive itself, says Nvidias Buck. Avehicle cant wait for a crummy connection when making a split second decision on who to avoid, and long tunnels might also be a problem. Soall of the computing has to happen in the vehicle. He touts the Nvidia P4 self-driving car platform for autonomous in-car smarts.

FPGAs are also making great strides on the device side. Intel has Arria, an FGPA coprocessor designed for low-energy inference tasks, while over at startup KRTKL, CEO Ryan Cousens and his team have bolted a low-energy dual-core ARM CPU to an FPGA that handles neural networking tasks. Itis crowdsourcing its platform, called Snickerdoodle, for makers and researchers that want wireless I/O and computer vision capabilities. You could run that on the ARM core and only send to the FPGA high-intensity mathematical operations, he says.

AI is squeezing into even smaller devices like the phone in your pocket. Some processor vendors are making general purpose improvements to their architectures that also serve AI well. For example, ARM is shipping CPUs with increasingly capable GPU areas on the die that should be able to better handle machine learning tasks.

Qualcomms SnapDragon processors now feature a neural processing engine that decides which bits of tailored logic machine learning and neural inference tasks should run in (voice detection in a digital signal processor and image detection on a builtin GPU, say). Itsupports the convolutional neural networks used in image recognition, too. Apple is reportedly planning its own neural processor, continuing its tradition of offloading phone processes onto dedicated silicon.

This all makes sense to ABIs Orr, who says that while most of the activity has been in cloud-based AI processors of late this will shift over the next few years as device capabilities balance them out. Inaddition to areas like AR, this may show up in more intelligent-seeming artificial assistants. Orr believes that they could do better at understanding what we mean.

They cant take action based on a really large dictionary of what possibly can be said, he says. Natural language processing can become more personalised and train the system rather than training the user.

This can only happen using silicon that allows more processing at given times to infer context and intent. Bybeing able to unload and switch through these different dictionaries that allow for tuning and personalization for all the things that a specific individual might say.

Research will continue in this space as teams focus on driving new efficiencies into inference architectures. Vivienne Sze, professor at MITs Energy-Efficient Multimedia Systems Group, says that in deep neural network inferencing, it isnt the computing that slurps most of the power. The dominant source of energy consumption is the act of moving the input data from the memory to the MAC [multiply and accumulate] hardware and then moving the data from the MAC hardware back to memory, she says.

Prof Sze works on a project called Eyeriss that hopes to solve that problem. In Eyeriss, we developed an optimized data flow (called row stationary), which reduces the amount of data movement, particularly from large memories, she continues.

There are many more research projects and startups developing processor architectures for AI. While we dont deny that marketing types like to sprinkle a little AI dust where it isnt always warranted, theres clearly enough of a belief in the technology that people are piling dollars into silicon.

Ascloud-based hardware continues to evolve, expect hardware to support AI locally in drones, phones, and automobiles, as the industry develops.

In the meantime, Microsofts researchers are apparently hoping to squeeze their squirrel-hunting code still further, this time onto the 0.007mm squared Cortex M0 chip. That will call for a machine learning model 1/10,000th the size of the one it put on the Pi. They must be nuts.

We'll be covering machine learning, AI and analytics and specialist hardware at MCubed London in October. Full details, including early bird tickets, right here.

The rest is here:

What sort of silicon brain do you need for artificial intelligence? - The Register

Related Posts

Comments are closed.