Executive Summary
In 2020, with high school exams canceled in many countries, the International Baccalaureate Organization (IBO) deployed an AI to determine final grades based on current and historical data. When the results came in, many scores did not correlate with grades that had been predicted, as had been the case in previous years, prompting many people to appeal their grades. Unfortunately, the appeals system for grades had not been changed from previous years, which was assumed that students would write examination papers. Since university place offers in many countries are contingent on students achieving predicted grades, many students have been denied places at their universities of choice, which has resulted in a great deal of anger. This experience highlights the risks of delegating life-altering decisions to AI without considering how apparently anomalous decisions can be appealed and, if necessary, changed.
How would you feel if an algorithm determined where your child went to college?
This year Covid-19 locked down millions of high school seniors and governments around the world canceled year-end graduation exams, forcing examining boards everywhere to consider other ways of setting the final grades that would largely determine the future of the class of 2020. One of these Boards, the International Baccalaureate Organization (IBO), opted for using artificial intelligence (AI) to help set overall scores for high-school graduates based on students past work and other historic data. (We use the term AI broadly to mean a computer program that uses data to execute a task that humans typically perform, in this case processing student scores.)
The experiment was not a success, and thousands of unhappy students and parents have since launched a furious protest campaign. So, what went wrong and what does the experience tell us about the challenges that come with AI-enabled solutions?
The IB is a rigorous and prestigious high-school certificate and diploma program taught by some of the worlds best schools. It opens doors to the worlds leading universities for talented and hard-working students in over 150 countries.
In a normal year, final grades are determined by coursework produced by the students and a final examination administered and corrected by the IBO directly. The coursework counts for some 20-30% of the overall final grade and the exam accounts for the remainder. Prior to the exam, teachers provide predicted grades, which allow universities to offer places conditional on the candidates final grades meeting the predictions. The IBO will also arrange independent grading of samples of each students coursework in order to discourage grade inflation by schools.
The process is generally considered to be a rigorous and well-regarded assessment protocol. The IBO has collected a substantial amount of data about each subject and school hundreds of thousands of data points, in some cases going back over 50 years. Significantly, the relationship between predicted and final grades has been tight. At leading IB schools over 90% of grades have been equal to predicted, and over 95% of total scores have been within a point from that predicted (total scores are set on a scale of one to 45).
In the spring of 2020, IBO had to decide whether to allow the exams to proceed or cancel them and award grades some other way. Allowing exams risked the safety of students and teachers, and could create fairness issues if, for instance, students in some countries were allowed to write the exams at home, while in others they had to sit exams at school.
Canceling the exams raised the question of how to assign grades, and thats when IBO turned to AI. Using its trove of historical data about students course work and predicted grades, as well as the data about the actual grade obtained at exams in previous years, the IBO decided to build a model to calculate an overall score for each student in a sense predicting what the 2020 students would have gotten at the exams. The model-building was outsourced to a subcontractor undisclosed at the time of publishing this article.
A crisis erupted when the results came out in early July 2020. Tens of thousands of students all over the world received grades that not only deviated substantially from their predicted grades but did so in unexplainable ways. Some 24,000, or more than 15% of all 2020 IB diploma recipients, have since signed the protest.IBOs social media pages are flooded with furious comments.Several governments have also launched formal investigations, and numerous lawsuits are in preparation, some for data abuse under EUs GDPR. Whats more, schools, students, and families involved in other high school programs that have also adopted AI solutions are raising very similar concerns, notably in the UK, where A level results are due out on August 13th, 2020.
As the outrage has spread, one critical and very practical question has been consistently raised by frustrated students and parents: How can they appeal the grades?
In normal years, the appeals process was well-defined and consisted of several levels, from the re-marking of an individual students exam to a review of marks for course work by subject at a given school. The former means having another look at a students work a natural first step when the grades were based on such work. The latter refers to an adjustment that IBO may apply to a schools grading of course work should a sample of work independently assessed by the IBO produce substantially different grades, on average, from those awarded by the school. The appeal process was well-understood and produced consistent results, but was not used frequently, largely because, as noted, there were few surprises when the final grades came out.
This year, the IB schools initially treated appeals as requests for re-marks of student work. But this poses a fundamental challenge: the graded papers were not in dispute it was the AI assessment that was called into question. The AI did not actually correct any papers; it only produced final grades based on the data it was fed, which included teacher-corrected coursework and the predicted grades. Since the specifics of the program are not disclosed, all people can see are the results, many of which were highly anomalous, with final scores in some cases well below the marks of the teacher-graded coursework of the students involved. Unsurprisingly, the IBOs appeals approach has not met with success it is in no way aligned with the way in which the AI created the grades.
The main lesson coming out of this experience is that any organization that decides to use an AI to produce an outcome as critical and sensitive as a high-school grade marking 12-years of students work, needs to be very clear about how the outcomes are produced and how they can be appealed in the event that they appear anomalous or unexpected. From the outside, it looks as though the IBO may have simply plugged the AI into the IB system to replace the exams and then assumed that the rest of the system in particular the appeals process could work as before.
So what sort of appeals process should the IBO have designed? First of all, the overall process of scoring and, more important, appealing the decision should be easy to explain, so that people understand what each next step will be. Note that this is not about explaining the AI black box, as current regulators do when arguing about the need for explainable AI. That would be almost impossible in many cases, since understanding the programming used in an AI generally requires a high level of technical sophistication. Rather, it is about making sure that people understand what information is used in assessing grades and what the steps are in the appeal process itself. So what the IBO could have done instead was offer appellants the right to a human-led re-evaluation of anomalous grades, specify what input data the appeal committee would focus on in reanalyzing the case, and explain how the problem would be fixed.
How the problem would be fixed would depend on whether the problem turned out to be student specific, school specific, or subject specific; a single students appeal might well affect other students depending on what components of the AI the appeal may relate to.
If, for example, a problem with an individual students grade seems to be driven by the school level data possibly a number of students studying in that same school have had final grades that differed markedly from their predicted grades then the appeal process would look at the grades of all students in that school. If needed, the AI algorithm itself would be adjusted for the school in question, without however affecting other schools, making sure the new scores provided by the AI are consistent across all schools while remaining the same for all but one school. In contrast, if the problem is linked to factors specific to the student, then the analysis would focus on identifying why the AI produced an anomalous outcome for that student and, if needed, re-score that student and any other student whose grades were affected in the same way.
Of course, much of this would be true of any grading process one students anomaly might signal a more systematic failing in any grading process whether or not an AI is engaged. But the way in which the appeal process is designed needs to reflect the different ways in which humans and machines make decisions and the specific design of the AI used as well as how the decisions can be corrected.
For example, because AI awards grades on the basis of its model of relationships between various input data, there should generally be no need to look at the actual work of the students concerned, and corrections could be made to all affected students (those with similar input data characteristics) all at once. In fact, in many ways appealing an AI grade could be an easier process than appealing a traditional exam-based grade.
Whats more, with an AI system, an appeals process along the lines described would enable continuous improvement to the AI. Had the IBO put such a system in place, the results of the appeals would have produced feedback data that could have updated the model for future uses in the event, say, that examinations are again cancelled next year.
***
The IBOs experience obviously has lessons for deploying AI in many contexts from approving credit, to job search or policing. Decisions in all these cases can, as with the IB, have life altering consequences for the people involved. It is inevitable that disputes over the outcomes will occur, given the stakes involved. Including AI in the decision-making process without carefully thinking through an appeals process and linking the appeals process to the algorithm design itself will likely end not only with new crises but potentially with a rejection of AI-enabled solutions in general. And that deprives us all of the potential for AI, when combined with humans, to dramatically improve the quality of decision-making.
Disclosure: One of the authors of this article is the parent of a student completing the IB program this year.
See more here:
What Happens When AI is Used to Set Grades? - Harvard Business Review
- Classic reasoning systems like Loom and PowerLoom vs. more modern systems based on probalistic networks [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Using Amazon's cloud service for computationally expensive calculations [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Software environments for working on AI projects [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- New version of my NLP toolkit [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Semantic Web: through the back door with HTML and CSS [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Java FastTag part of speech tagger is now released under the LGPL [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Defining AI and Knowledge Engineering [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Great Overview of Knowledge Representation [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Something like Google page rank for semantic web URIs [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- My experiences writing AI software for vehicle control in games and virtual reality systems [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- The URL for this blog has changed [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- I have a new page on Knowledge Management [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- N-GRAM analysis using Ruby [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Good video: Knowledge Representation and the Semantic Web [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Using the PowerLoom reasoning system with JRuby [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Machines Like Us [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- RapidMiner machine learning, data mining, and visualization tool [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- texai.org [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- NLTK: The Natural Language Toolkit [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- My OpenCalais Ruby client library [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Ruby API for accessing Freebase/Metaweb structured data [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Protégé OWL Ontology Editor [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- New version of Numenta software is available [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Very nice: Elsevier IJCAI AI Journal articles now available for free as PDFs [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Verison 2.0 of OpenCyc is available [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- What’s Your Biggest Question about Artificial Intelligence? [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Minimax Search [Knowledge] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Decision Tree [Knowledge] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- More AI Content & Format Preference Poll [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- New Planners Solve Rescue Missions [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Neural Network Learns to Bluff at Poker [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Pushing the Limits of Game AI Technology [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Mining Data for the Netflix Prize [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Interview with Peter Denning on the Principles of Computing [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Decision Making for Medical Support [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Neural Network Creates Music CD [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- jKilavuz - a guide in the polygon soup [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Artificial General Intelligence: Now Is the Time [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Apply AI 2007 Roundtable Report [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- What Would You do With 80 Cores? [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Software Finds Learning Language Child's Play [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Artificial Intelligence in Games [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Artificial Intelligence Resources [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Alan Turing: Mathematical Biologist? [Last Updated On: April 25th, 2012] [Originally Added On: April 25th, 2012]
- BBC Horizon: The Hunt for AI ( Artificial Intelligence ) - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Can computers have true artificial intelligence" Masonic handshake" 3rd-April-2012 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Kevin B. Korb - Interview - Artificial Intelligence and the Singularity p3 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Artificial Intelligence - 6 Month Anniversary - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Science Breakthroughs [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Hitman: Blood Money - Part 49 - Stupid Artificial Intelligence! - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Research Members Turned Off By HAARP Artificial Intelligence - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Artificial Intelligence Lecture No. 5 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- The Artificial Intelligence Laboratory, 2012 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Charlie Rose - Artificial Intelligence - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Expert on artificial intelligence to speak at EPIIC Nights dinner [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- Filipino software engineers complete and best thousands on Stanford’s Artificial Intelligence Course [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- Vodafone xone™ Hackathon Challenges Developers and Entrepreneurs to Build a New Generation of Artificial Intelligence ... [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- Rocket Fuel Packages Up CPG Booster [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- 2 Filipinos finishes among top in Stanford’s Artificial Intelligence course [Last Updated On: May 5th, 2012] [Originally Added On: May 5th, 2012]
- Why Your Brain Isn't A Computer [Last Updated On: May 5th, 2012] [Originally Added On: May 5th, 2012]
- 2 Pinoy software engineers complete Stanford's AI course [Last Updated On: May 7th, 2012] [Originally Added On: May 7th, 2012]
- Percipio Media, LLC Proudly Accepts Partnership With MIT's Prestigious Computer Science And Artificial Intelligence ... [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
- Google Driverless Car Ok'd by Nevada [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
- Moving Beyond the Marketing Funnel: Rocket Fuel and Forrester Research Announce Free Webinar [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
- Rocket Fuel Wins 2012 San Francisco Business Times Tech & Innovation Award [Last Updated On: May 13th, 2012] [Originally Added On: May 13th, 2012]
- Internet Week 2012: Rocket Fuel to Speak at OMMA RTB [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
- How to Get the Most Out of Your Facebook Ads -- Rocket Fuel's VP of Products, Eshwar Belani, to Lead MarketingProfs ... [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
- The Digital Disruptor To Banking Has Just Gone International [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
- Moving Beyond the Marketing Funnel: Rocket Fuel Announce Free Webinar Featuring an Independent Research Firm [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
- MASA Showcases Latest Version of MASA SWORD for Homeland Security Markets [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
- Bluesky Launches Drones for Aerial Surveying [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
- Artificial Intelligence: What happened to the hunt for thinking machines? [Last Updated On: May 25th, 2012] [Originally Added On: May 25th, 2012]
- Bubble Robots Move Using Lasers [VIDEO] [Last Updated On: May 25th, 2012] [Originally Added On: May 25th, 2012]
- UHV assistant professors receive $10,000 summer research grants [Last Updated On: May 27th, 2012] [Originally Added On: May 27th, 2012]
- Artificial intelligence: science fiction or simply science? [Last Updated On: May 28th, 2012] [Originally Added On: May 28th, 2012]
- Exetel taps artificial intelligence [Last Updated On: May 29th, 2012] [Originally Added On: May 29th, 2012]
- Software offers brain on the rain [Last Updated On: May 29th, 2012] [Originally Added On: May 29th, 2012]
- New Dean of Science has high hopes for his faculty [Last Updated On: May 30th, 2012] [Originally Added On: May 30th, 2012]
- Cognitive Code Announces "Silvia For Android" App [Last Updated On: May 31st, 2012] [Originally Added On: May 31st, 2012]
- A Rat is Smarter Than Google [Last Updated On: June 5th, 2012] [Originally Added On: June 5th, 2012]