Elon Musk and Silicon Valley’s Overreliance on Artificial Intelligence – The Wire

When the richest man in the world is being sued by one of the most popular social media companies, its news. But while most of the conversation about Elon Musks attempt to cancel his $44 billion contract to buy Twitter is focusing on the legal, social, and business components, we need to keep an eye on how the discussion relates to one of tech industrys most buzzy products: artificial intelligence.

The lawsuit shines a light on one of the most essential issues for the industry to tackle: What can and cant AI do, and what should and shouldnt AI do? The Twitter v Musk contretemps reveals a lot about the thinking about AI in tech and startup land and raises issues about how we understand the deployment of the technology in areas ranging from credit checks to policing.

At the core of Musks claim for why he should be allowed out of his contract with Twitter is an allegation that the platform has done a poor job of identifying and removing spam accounts. Twitter has consistently claimed in quarterly filings that less than 5% of its active accounts are spam; Musk thinks its much higher than that. From a legal standpoint, it probably doesnt really matter if Twitters spam estimate is off by a few percent, and Twitters been clear that its estimate is subjective and that others could come to different estimates with the same data. Thats presumably why Musks legal team lost in a hearing on July 19when they asked for more time to perform detailed discovery on Twitters spam-fighting efforts, suggesting that likely isnt the question on which the trial will turn.

Regardless of the legal merits, its important to scrutinise the statistical and technical thinking from Musk and his allies. Musks position is best summarised in his filing from July 15, which states: In a May 6 meeting with Twitter executives, Musk was flabbergasted to learn just how meager Twitters process was. Namely: Human reviewers randomly sampled 100 accounts per day (less than 0.00005% of daily users) and applied unidentified standards to somehow conclude every quarter for nearly three years that fewer than 5% of Twitter users were false or spam. The filing goes on to express the flabbergastedness of Musk by adding, Thats it. No automation, no AI, no machine learning.

Perhaps the most prominent endorsement of Musks argument here came from venture capitalist David Sacks,who quoted it while declaring, Twitter is toast. But theres an irony in Musks complaint here: If Twitter were using machine learning for the audit as he seems to think they should, and only labeling spam that was similar to old spam, it would actually produce a lower, less-accurate estimate than it has now.

There are three components to Musks assertion that deserve examination: his basic statistical claim about what a representative sample looks like, his claim that the spam-level auditing process should automated or use AI or machine learning, and an implicit claim about what AI can actually do.

On the statistical question, this is something any professional anywhere near the machine learning space should be able to answer (so can many high school students). Twitter uses a daily sampling of accounts to scrutinise a total of 9,000 accounts per quarter (averaging about 100 per calendar day) to arrive at its under-5% spam estimate. Though that sample of 9,000 users per quarter is, as Musk notes, a very small portion of the 229 million active users the company reported in early 2022, a statistics professor (or student) would tell you that thats very much not the point. Statistical significance isnt determined by what percentage of the population is sampled but simply by the actual size of the sample in question. As Facebook whistleblower Sophie Zhang put it, you can make the comparison to soup: It doesnt matter if you have a small or giant pot of soup, if its evenly mixed you just need a spoonful to taste-test.

The whole point of statistical sampling is that you can learn most of what you need to know about the variety of a larger population by studying a much-smaller but decently sized portion of it. Whether the person drawing the sample is a scientist studying bacteria, or a factory quality inspector checking canned vegetables, or a pollster asking about political preferences, the question isnt what percentage of the overall whole am I checking, but rather how much should I expect my sample to look like the overall population for the characteristics Im studying? If you had to crack open a large percentage of your cans of tomatoes to check for their quality, youd have a hard time making a profit, so you want to check the fewest possible to get within a reasonable range of confidence in your findings.

Also read: Why Understanding This 60S Sci-Fi Novel Is Key to Understanding Elon Musk

While this thinking does go against the grain of certain impulses (theres a reason why many people make this mistake), there is also a way to make this approach to sampling more intuitive. Think of the goal in setting sample size as getting a reasonable answer to the question, If I draw another sample of the same size, how different would I expect it to be? A classic approach to explaining this problem is to imagine youve bought a great mass of marbles, that are supposed to come in a specific ratio: 95% purple marbles and 5% yellow marbles. You want to do a quality inspection to ensure the delivery is good, so you load them into one of those bingo game hoppers, turn the crank, and start counting the marbles you draw, in each color. Lets say your first sample of 20 marbles has 19 purple and one yellow; should you be confident that you got the right mix from your vendor? You can probably intuitively understand that the next 20 random marbles you draw could end up being very different, with zero yellows or seven. But what if you draw 1,000 marbles, around the same as the typical political poll? What if you draw 9,000 marbles? The more marbles you draw, the more youd expect the next drawing to look similar, because its harder to hide random fluctuations in larger samples.

There are onlinecalculators that can let you run the numbers yourself. If you only draw 20 marbles and get one yellow, you can have 95% confidence that the yellows would be between 0.13% and 24.9% of the total not very exact. If you draw 1,000 marbles and get 50 yellows, you can have 95% confidence that yellows would be between 3.7% and 6.5% of the total; closer, but perhaps not something youd sign your name to in a quarterly filing. At 9,000 marbles with 450 yellow, you can have 95% confidence the yellows are between 4.56% and 5.47%; youre now accurate to within a range of less than half a percent, and at that point Twitters lawyers presumably told them theyd done enough for their public disclosure.

Printed Twitter logos are seen in this picture illustration taken April 28, 2022. Photo: Reuters/Dado Ruvic/Illustration/File Photo

This reality that statistical sampling works to tell us about large populations based on much-smaller samples underpins every area where statistics is used, from checking the quality of the concrete used to make the building youre currently sitting in, to ensuring the reliable flow of internet traffic to the screen youre reading this on.

Its also what drives all current approaches to artificial intelligence today. Specialists in the field almost never use the term artificial intelligence to describe their work, preferring to use machine learning. But another common way to describe the entire field as it currently stands is applied statistics. Machine learning today isnt really computers thinking in anything like what we assume humans do (to the degree we even understand how humans think, which isnt a great degree); its mostly pattern-matching and -identification, based on statistical optimisation. If you feed a convolutional neural network thousands of images of dogs and cats and then ask the resulting model to determine if the next image is of a dog or a cat, itll probably do a good job, but you cant ask it to explain what makes a cat different from a dog on any broader level; its just recognising the patterns in pictures, using a layering of statistical formulas.

Stack up statistical formulas in specific ways, and you can build a machine learning algorithm that, fed enough pictures, will gradually build up a statistical representation of edges, shapes, and larger forms until it recognises a cat, based on the similarity to thousands of other images of cats it was fed. Theres also a way in which statistical sampling plays a role: You dont need pictures of all the dogs and cats, just enough to get a representative sample, and then your algorithm can infer what it needs to about all the other pictures of dogs and cats in the world. And the same goes for every other machine learning effort, whether its an attempt to predict someones salary using everything else you know about them, with a boosted random forests algorithm, or to break down a list of customers into distinct groups, in a clustering algorithm like a support vector machine.

You dont absolutely have to understand statistics as well as a student whos recently taken a class in order to understand machine learning, but it helps. Which is why the statistical illiteracy paraded by Musk and his acolytes here is at least somewhat surprising.

But more important, in order to have any basis for overseeing the creation of a machine-learning product, or to have a rationale for investing in a machine-learning company, its hard to see how one could be successful without a decent grounding in the rudiments of machine learning, and where and how it is best applied to solve a problem. And yet, team Musk here is suggesting they do lack that knowledge.

Once you understand that all machine learning today is essentially pattern-matching, it becomes clear why you wouldnt rely on it to conduct an audit such as the one Twitter performs to check for the proportion of spam accounts. Theyre hand-validating so that they ensure its high-quality data, explained security professional Leigh Honeywell, whos been a leader at firms like Slack and Heroku, in an interview. She added, any data you pull from your machine learning efforts will by necessity be not as validated as those efforts. If you only rely on patterns of spam youve already identified in the past and already engineered into your spam-detection tools, in order to find out how much spam there is on your platform, youll only recognise old spam patterns, and fail to uncover new ones.

Also read: India Versus Twitter Versus Elon Musk Versus Society

Where Twitter should be using automation and machine learning to identify and remove spam is outside of this audit function, which the company seems to do. It wouldnt otherwise be possible tosuspend half a million accountsevery day and lock millions of accounts each week, as CEO Parag Agrawal claims. In conversations Ive had with cybersecurity workers in the field, its quite clear that large amounts of automation is used at Twitter (though machine learning specifically is actually relatively rare in the field because the results often arent as good as other methods, marketing claims by allegedly AI-based security firms to the contrary).

At least in public claims related to this lawsuit, prominent Silicon Valley figures are suggesting they have a different understanding of what machine learning can do, and when it is and isnt useful. This disconnect between how many nontechnical leaders in that world talk about AI, and what it actually is, has significant implications for how we will ultimately come to understand and use the technology.

The general disconnect between the actual work of machine learning and how its touted by many company and industry leaders is something data scientists often chalk up to marketing. Its very common to hear data scientists in conversation among themselves declare that AI is just a marketing term. Its also quite common to have companies using no machine learning at all describe their work as AI to investors and customers, who rarely know the difference or even seem to care.

This is a basic reality in the world of tech. In my own experience talking with investors who make investments in AI technology, its often quite clear that they know almost nothing about these basic aspects of how machine learning works. Ive even spoken to CEOs of rather large companies that rely at their core on novel machine learning efforts to drive their product, who also clearly have no understanding of how the work actually gets done.

Not knowing or caring how machine learning works, what it can or cant do, and where its application can be problematic could lead society to significant peril. If we dont understand the way machine learning actually works most often by identifying a pattern in some dataset and applying that pattern to new data we can be led deep down a path in which machine learning wrongly claims, for example, to measure someones face for trustworthiness (when this is entirely based on surveys in which people reveal their own prejudices), or that crime can be predicted (when many hyperlocal crime numbers are highly correlated with more police officers being present in a given area, who then make more arrests there), based almost entirely on a set of biased data or wrong-headed claims.

If were going to properly manage the influence of machine learning on our society on our systems and organisations and our government we need to make sure these distinctions are clear. It starts with establishing a basic level of statistical literacy, and moves on to recognising that machine learning isnt magicand that it isnt, in any traditional sense of the word, intelligent that it works by pattern-matching to data, that the data has various biases, and that the overall project can produce many misleading and/or damaging outcomes.

Its an understanding one might have expected or at least hoped to find among some of those investing most of their life, effort, and money into machine-learning-related projects. If even people that deep arent making those efforts to sort fact from fiction, its a poor omen for the rest of us, and the regulators and other officials who might be charged with keeping them in check.

This article was originally published on Future Tense, a partnership between Slate magazine, Arizona State University, and New America.