Simple ways to find out what AI can do – Fast Company

Aside from drawing photo-realistic images and holding seemingly sentient conversations, AI has failed on many promises. The resulting rise in AI skepticism leaves us with a choice: We can become too cynical and watch from the sidelines as winners emerge, or find a way to filter noise and identify commercial breakthroughs early to participate in a historic economic opportunity.

Theres a simple framework for differentiating near-term reality from science fiction. We use the single most important measure of maturity in any technology: its ability to manage unforeseen events commonly known as edge cases. As a technology hardens, it becomes more adept at handling increasingly infrequent edge cases and, as a result, gradually unlocking new applications.

Edge case reliability is measured differently for different technologies. A cloud services uptime could be one way to assess reliability. For AI, a better measure would be its accuracy. When an AI fails to handle an edge case, it produces a false positive, or a false negative. Precision is a metric that measures false positives, and Recall measures false negatives.

Heres an important insight: Todays AI can achieve very high performance if it is focused on either precision, or recall. In other words, it optimizes one at the expense of the other (i.e., fewer false positives in exchange for more false negatives, and vice versa). But when it comes to achieving high performance on both of those simultaneously, AI models struggle. Solving this remains the holy grail of AI.

Based on the above, we can categorize AI into two classes: high-fidelity versus low-fidelity. An AI with either high precision or high recall is lo-fi. And one with both high precision and high recall is hi-fi. Today, AI models used in image recognition, content personalization, and spam filtering are lo-fi. Models required by robo-taxis, however, have to be hi-fi.

There are a few important insights about lo-fi and hi-fi AI worth noting:

A popular metric for evaluating AI reliability is the F1 score, which is a type of numeric average of precision and recall, thus measuring for both false positives and false negatives. A F1 of 100% represents a perfectly error-free AI that handles all edge cases. By our estimate, some of the best AI today perform at a rate of 99%, though a score above 90% is generally considered high.

Lets calculate the F1 score for two applications:

It is clear from the above examples that a F1 of 65% is easily achievable by todays AI, but how far away are we from an F1 of six nines?

As discussed earlier, maturity and market readiness for any technology is tied to how well it handles edge cases. For AI, the F1 score can be a useful approximation for maturity. Similarly, for previous waves of digital innovation such as web and cloud, we can use their uptime as a signal for maturity.

As a 30-year-old technology, the web is one of the most reliable digital experiences. The most mature sites such as Google and Gmail aim for 99.999% uptime (five nines), meaning the service is unavailable no more than six minutes per year. This is sometimes missed by a wide margin, such as YouTubes 62 minute disruption in 2018 or Gmails six hour outage in 2020.

At roughly half of the webs age, the cloud is less reliable. Most services offered by Amazon AWS have an uptime SLA of 99.99%, or four nines. That is an order of magnitude less than Gmail, but still very high.

A few observations:

Google engineers who left their self-driving car team to start their companies had a common thesis: Narrowly-defined applications of autonomy will be easier to commercialize than general self-driving. In 2017, Aurora was founded to move goods via long-haul trucks on highways. Around the same time, Nuro was founded to move goods in small cars and at slower speeds.

Our team also shared this thesis when we started off inside Postmates (also in 2017). Our focus has also been on moving goods but, contrary to others, we chose to leave cars behind and instead focus on smaller form robots that operate off the street: Autonomous Mobile Robots (AMRs). These are widely adopted in controlled environments such as factory floors and warehouses.

Consider red-light detection for delivery robots. While they should never cross on red given the risk of collision with vehicles, conservatively stopping on green introduces no safety risk. Therefore, a recall rate similar to robo-taxis (99.9999%) along with a modest precision (80%) would be adequate for this AI use case. This results in an F1 of 90% (one nine), which is easy to achieve. By moving from street to sidewalk and from a full-size car to a small robot, the AI accuracy required decreases six nines to one.

Delivery AMRs are the first application of urban autonomy to commercialize, while robo-taxis still await an unattainable hi-fi AI performance. The rate of progress in this industry, as well as our experience over the past five years, has strengthened our view that the best way to commercialize AI is to focus on narrower applications enabled by lo-fi AI, and use human intervention to achieve hi-fi performance when needed. In this model, lo-fi AI leads to early commercialization, and incremental improvements afterwards help drive business KPIs.

By targeting more forgiving use cases, businesses can use lo-fi AI to achieve commercial success early, while maintaining a realistic view of the multi-year timeline for achieving hi-fi capabilities. After all, sci-fi has no place in business planning.

Ali Kashaniis the cofounder and CEO of Serve Robotics.

Read more here:

Simple ways to find out what AI can do - Fast Company

Related Posts

Comments are closed.