When the richest man in the world is being sued by one of the most popular social media companies, its news. But while most of the conversation about Elon Musks attempt to cancel his $44 billion contract to buy Twitter is focusing on the legal, social, and business components, we need to keep an eye on how the discussion relates to one of tech industrys most buzzy products: artificial intelligence.
The lawsuit shines a light on one of the most essential issues for the industry to tackle: What can and cant AI do, and what should and shouldnt AI do? The Twitter v Musk contretemps reveals a lot about the thinking about AI in tech and startup land and raises issues about how we understand the deployment of the technology in areas ranging from credit checks to policing.
At the core of Musks claim for why he should be allowed out of his contract with Twitter is an allegation that the platform has done a poor job of identifying and removing spam accounts. Twitter has consistently claimed in quarterly filings that less than 5% of its active accounts are spam; Musk thinks its much higher than that. From a legal standpoint, it probably doesnt really matter if Twitters spam estimate is off by a few percent, and Twitters been clear that its estimate is subjective and that others could come to different estimates with the same data. Thats presumably why Musks legal team lost in a hearing on July 19when they asked for more time to perform detailed discovery on Twitters spam-fighting efforts, suggesting that likely isnt the question on which the trial will turn.
Regardless of the legal merits, its important to scrutinise the statistical and technical thinking from Musk and his allies. Musks position is best summarised in his filing from July 15, which states: In a May 6 meeting with Twitter executives, Musk was flabbergasted to learn just how meager Twitters process was. Namely: Human reviewers randomly sampled 100 accounts per day (less than 0.00005% of daily users) and applied unidentified standards to somehow conclude every quarter for nearly three years that fewer than 5% of Twitter users were false or spam. The filing goes on to express the flabbergastedness of Musk by adding, Thats it. No automation, no AI, no machine learning.
Perhaps the most prominent endorsement of Musks argument here came from venture capitalist David Sacks,who quoted it while declaring, Twitter is toast. But theres an irony in Musks complaint here: If Twitter were using machine learning for the audit as he seems to think they should, and only labeling spam that was similar to old spam, it would actually produce a lower, less-accurate estimate than it has now.
There are three components to Musks assertion that deserve examination: his basic statistical claim about what a representative sample looks like, his claim that the spam-level auditing process should automated or use AI or machine learning, and an implicit claim about what AI can actually do.
On the statistical question, this is something any professional anywhere near the machine learning space should be able to answer (so can many high school students). Twitter uses a daily sampling of accounts to scrutinise a total of 9,000 accounts per quarter (averaging about 100 per calendar day) to arrive at its under-5% spam estimate. Though that sample of 9,000 users per quarter is, as Musk notes, a very small portion of the 229 million active users the company reported in early 2022, a statistics professor (or student) would tell you that thats very much not the point. Statistical significance isnt determined by what percentage of the population is sampled but simply by the actual size of the sample in question. As Facebook whistleblower Sophie Zhang put it, you can make the comparison to soup: It doesnt matter if you have a small or giant pot of soup, if its evenly mixed you just need a spoonful to taste-test.
The whole point of statistical sampling is that you can learn most of what you need to know about the variety of a larger population by studying a much-smaller but decently sized portion of it. Whether the person drawing the sample is a scientist studying bacteria, or a factory quality inspector checking canned vegetables, or a pollster asking about political preferences, the question isnt what percentage of the overall whole am I checking, but rather how much should I expect my sample to look like the overall population for the characteristics Im studying? If you had to crack open a large percentage of your cans of tomatoes to check for their quality, youd have a hard time making a profit, so you want to check the fewest possible to get within a reasonable range of confidence in your findings.
Also read: Why Understanding This 60S Sci-Fi Novel Is Key to Understanding Elon Musk
While this thinking does go against the grain of certain impulses (theres a reason why many people make this mistake), there is also a way to make this approach to sampling more intuitive. Think of the goal in setting sample size as getting a reasonable answer to the question, If I draw another sample of the same size, how different would I expect it to be? A classic approach to explaining this problem is to imagine youve bought a great mass of marbles, that are supposed to come in a specific ratio: 95% purple marbles and 5% yellow marbles. You want to do a quality inspection to ensure the delivery is good, so you load them into one of those bingo game hoppers, turn the crank, and start counting the marbles you draw, in each color. Lets say your first sample of 20 marbles has 19 purple and one yellow; should you be confident that you got the right mix from your vendor? You can probably intuitively understand that the next 20 random marbles you draw could end up being very different, with zero yellows or seven. But what if you draw 1,000 marbles, around the same as the typical political poll? What if you draw 9,000 marbles? The more marbles you draw, the more youd expect the next drawing to look similar, because its harder to hide random fluctuations in larger samples.
There are onlinecalculators that can let you run the numbers yourself. If you only draw 20 marbles and get one yellow, you can have 95% confidence that the yellows would be between 0.13% and 24.9% of the total not very exact. If you draw 1,000 marbles and get 50 yellows, you can have 95% confidence that yellows would be between 3.7% and 6.5% of the total; closer, but perhaps not something youd sign your name to in a quarterly filing. At 9,000 marbles with 450 yellow, you can have 95% confidence the yellows are between 4.56% and 5.47%; youre now accurate to within a range of less than half a percent, and at that point Twitters lawyers presumably told them theyd done enough for their public disclosure.
Printed Twitter logos are seen in this picture illustration taken April 28, 2022. Photo: Reuters/Dado Ruvic/Illustration/File Photo
This reality that statistical sampling works to tell us about large populations based on much-smaller samples underpins every area where statistics is used, from checking the quality of the concrete used to make the building youre currently sitting in, to ensuring the reliable flow of internet traffic to the screen youre reading this on.
Its also what drives all current approaches to artificial intelligence today. Specialists in the field almost never use the term artificial intelligence to describe their work, preferring to use machine learning. But another common way to describe the entire field as it currently stands is applied statistics. Machine learning today isnt really computers thinking in anything like what we assume humans do (to the degree we even understand how humans think, which isnt a great degree); its mostly pattern-matching and -identification, based on statistical optimisation. If you feed a convolutional neural network thousands of images of dogs and cats and then ask the resulting model to determine if the next image is of a dog or a cat, itll probably do a good job, but you cant ask it to explain what makes a cat different from a dog on any broader level; its just recognising the patterns in pictures, using a layering of statistical formulas.
Stack up statistical formulas in specific ways, and you can build a machine learning algorithm that, fed enough pictures, will gradually build up a statistical representation of edges, shapes, and larger forms until it recognises a cat, based on the similarity to thousands of other images of cats it was fed. Theres also a way in which statistical sampling plays a role: You dont need pictures of all the dogs and cats, just enough to get a representative sample, and then your algorithm can infer what it needs to about all the other pictures of dogs and cats in the world. And the same goes for every other machine learning effort, whether its an attempt to predict someones salary using everything else you know about them, with a boosted random forests algorithm, or to break down a list of customers into distinct groups, in a clustering algorithm like a support vector machine.
You dont absolutely have to understand statistics as well as a student whos recently taken a class in order to understand machine learning, but it helps. Which is why the statistical illiteracy paraded by Musk and his acolytes here is at least somewhat surprising.
But more important, in order to have any basis for overseeing the creation of a machine-learning product, or to have a rationale for investing in a machine-learning company, its hard to see how one could be successful without a decent grounding in the rudiments of machine learning, and where and how it is best applied to solve a problem. And yet, team Musk here is suggesting they do lack that knowledge.
Once you understand that all machine learning today is essentially pattern-matching, it becomes clear why you wouldnt rely on it to conduct an audit such as the one Twitter performs to check for the proportion of spam accounts. Theyre hand-validating so that they ensure its high-quality data, explained security professional Leigh Honeywell, whos been a leader at firms like Slack and Heroku, in an interview. She added, any data you pull from your machine learning efforts will by necessity be not as validated as those efforts. If you only rely on patterns of spam youve already identified in the past and already engineered into your spam-detection tools, in order to find out how much spam there is on your platform, youll only recognise old spam patterns, and fail to uncover new ones.
Also read: India Versus Twitter Versus Elon Musk Versus Society
Where Twitter should be using automation and machine learning to identify and remove spam is outside of this audit function, which the company seems to do. It wouldnt otherwise be possible tosuspend half a million accountsevery day and lock millions of accounts each week, as CEO Parag Agrawal claims. In conversations Ive had with cybersecurity workers in the field, its quite clear that large amounts of automation is used at Twitter (though machine learning specifically is actually relatively rare in the field because the results often arent as good as other methods, marketing claims by allegedly AI-based security firms to the contrary).
At least in public claims related to this lawsuit, prominent Silicon Valley figures are suggesting they have a different understanding of what machine learning can do, and when it is and isnt useful. This disconnect between how many nontechnical leaders in that world talk about AI, and what it actually is, has significant implications for how we will ultimately come to understand and use the technology.
The general disconnect between the actual work of machine learning and how its touted by many company and industry leaders is something data scientists often chalk up to marketing. Its very common to hear data scientists in conversation among themselves declare that AI is just a marketing term. Its also quite common to have companies using no machine learning at all describe their work as AI to investors and customers, who rarely know the difference or even seem to care.
This is a basic reality in the world of tech. In my own experience talking with investors who make investments in AI technology, its often quite clear that they know almost nothing about these basic aspects of how machine learning works. Ive even spoken to CEOs of rather large companies that rely at their core on novel machine learning efforts to drive their product, who also clearly have no understanding of how the work actually gets done.
Not knowing or caring how machine learning works, what it can or cant do, and where its application can be problematic could lead society to significant peril. If we dont understand the way machine learning actually works most often by identifying a pattern in some dataset and applying that pattern to new data we can be led deep down a path in which machine learning wrongly claims, for example, to measure someones face for trustworthiness (when this is entirely based on surveys in which people reveal their own prejudices), or that crime can be predicted (when many hyperlocal crime numbers are highly correlated with more police officers being present in a given area, who then make more arrests there), based almost entirely on a set of biased data or wrong-headed claims.
If were going to properly manage the influence of machine learning on our society on our systems and organisations and our government we need to make sure these distinctions are clear. It starts with establishing a basic level of statistical literacy, and moves on to recognising that machine learning isnt magicand that it isnt, in any traditional sense of the word, intelligent that it works by pattern-matching to data, that the data has various biases, and that the overall project can produce many misleading and/or damaging outcomes.
Its an understanding one might have expected or at least hoped to find among some of those investing most of their life, effort, and money into machine-learning-related projects. If even people that deep arent making those efforts to sort fact from fiction, its a poor omen for the rest of us, and the regulators and other officials who might be charged with keeping them in check.
This article was originally published on Future Tense, a partnership between Slate magazine, Arizona State University, and New America.
Read more here:
Elon Musk and Silicon Valley's Overreliance on Artificial Intelligence - The Wire
- Classic reasoning systems like Loom and PowerLoom vs. more modern systems based on probalistic networks [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Using Amazon's cloud service for computationally expensive calculations [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Software environments for working on AI projects [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- New version of my NLP toolkit [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Semantic Web: through the back door with HTML and CSS [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Java FastTag part of speech tagger is now released under the LGPL [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Defining AI and Knowledge Engineering [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Great Overview of Knowledge Representation [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Something like Google page rank for semantic web URIs [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- My experiences writing AI software for vehicle control in games and virtual reality systems [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- The URL for this blog has changed [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- I have a new page on Knowledge Management [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- N-GRAM analysis using Ruby [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Good video: Knowledge Representation and the Semantic Web [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Using the PowerLoom reasoning system with JRuby [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Machines Like Us [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- RapidMiner machine learning, data mining, and visualization tool [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- texai.org [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- NLTK: The Natural Language Toolkit [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- My OpenCalais Ruby client library [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Ruby API for accessing Freebase/Metaweb structured data [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Protégé OWL Ontology Editor [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- New version of Numenta software is available [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Very nice: Elsevier IJCAI AI Journal articles now available for free as PDFs [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Verison 2.0 of OpenCyc is available [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- What’s Your Biggest Question about Artificial Intelligence? [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Minimax Search [Knowledge] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Decision Tree [Knowledge] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- More AI Content & Format Preference Poll [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- New Planners Solve Rescue Missions [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Neural Network Learns to Bluff at Poker [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Pushing the Limits of Game AI Technology [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Mining Data for the Netflix Prize [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Interview with Peter Denning on the Principles of Computing [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Decision Making for Medical Support [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Neural Network Creates Music CD [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- jKilavuz - a guide in the polygon soup [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Artificial General Intelligence: Now Is the Time [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Apply AI 2007 Roundtable Report [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- What Would You do With 80 Cores? [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Software Finds Learning Language Child's Play [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Artificial Intelligence in Games [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Artificial Intelligence Resources [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Alan Turing: Mathematical Biologist? [Last Updated On: April 25th, 2012] [Originally Added On: April 25th, 2012]
- BBC Horizon: The Hunt for AI ( Artificial Intelligence ) - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Can computers have true artificial intelligence" Masonic handshake" 3rd-April-2012 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Kevin B. Korb - Interview - Artificial Intelligence and the Singularity p3 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Artificial Intelligence - 6 Month Anniversary - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Science Breakthroughs [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Hitman: Blood Money - Part 49 - Stupid Artificial Intelligence! - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Research Members Turned Off By HAARP Artificial Intelligence - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Artificial Intelligence Lecture No. 5 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- The Artificial Intelligence Laboratory, 2012 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Charlie Rose - Artificial Intelligence - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Expert on artificial intelligence to speak at EPIIC Nights dinner [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- Filipino software engineers complete and best thousands on Stanford’s Artificial Intelligence Course [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- Vodafone xone™ Hackathon Challenges Developers and Entrepreneurs to Build a New Generation of Artificial Intelligence ... [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- Rocket Fuel Packages Up CPG Booster [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- 2 Filipinos finishes among top in Stanford’s Artificial Intelligence course [Last Updated On: May 5th, 2012] [Originally Added On: May 5th, 2012]
- Why Your Brain Isn't A Computer [Last Updated On: May 5th, 2012] [Originally Added On: May 5th, 2012]
- 2 Pinoy software engineers complete Stanford's AI course [Last Updated On: May 7th, 2012] [Originally Added On: May 7th, 2012]
- Percipio Media, LLC Proudly Accepts Partnership With MIT's Prestigious Computer Science And Artificial Intelligence ... [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
- Google Driverless Car Ok'd by Nevada [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
- Moving Beyond the Marketing Funnel: Rocket Fuel and Forrester Research Announce Free Webinar [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
- Rocket Fuel Wins 2012 San Francisco Business Times Tech & Innovation Award [Last Updated On: May 13th, 2012] [Originally Added On: May 13th, 2012]
- Internet Week 2012: Rocket Fuel to Speak at OMMA RTB [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
- How to Get the Most Out of Your Facebook Ads -- Rocket Fuel's VP of Products, Eshwar Belani, to Lead MarketingProfs ... [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
- The Digital Disruptor To Banking Has Just Gone International [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
- Moving Beyond the Marketing Funnel: Rocket Fuel Announce Free Webinar Featuring an Independent Research Firm [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
- MASA Showcases Latest Version of MASA SWORD for Homeland Security Markets [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
- Bluesky Launches Drones for Aerial Surveying [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
- Artificial Intelligence: What happened to the hunt for thinking machines? [Last Updated On: May 25th, 2012] [Originally Added On: May 25th, 2012]
- Bubble Robots Move Using Lasers [VIDEO] [Last Updated On: May 25th, 2012] [Originally Added On: May 25th, 2012]
- UHV assistant professors receive $10,000 summer research grants [Last Updated On: May 27th, 2012] [Originally Added On: May 27th, 2012]
- Artificial intelligence: science fiction or simply science? [Last Updated On: May 28th, 2012] [Originally Added On: May 28th, 2012]
- Exetel taps artificial intelligence [Last Updated On: May 29th, 2012] [Originally Added On: May 29th, 2012]
- Software offers brain on the rain [Last Updated On: May 29th, 2012] [Originally Added On: May 29th, 2012]
- New Dean of Science has high hopes for his faculty [Last Updated On: May 30th, 2012] [Originally Added On: May 30th, 2012]
- Cognitive Code Announces "Silvia For Android" App [Last Updated On: May 31st, 2012] [Originally Added On: May 31st, 2012]
- A Rat is Smarter Than Google [Last Updated On: June 5th, 2012] [Originally Added On: June 5th, 2012]