Status on human vs. machines
Are computers beating humans. In mere simple number crunching yes, but also in more complex tasks.
Year | Domain | Description |
---|---|---|
2022 | Gran Turismo | A research summary reported “a neural-network algorithm — called GT Sophy — that is capable of winning against the best human players of the video game Gran Turismo.” |
2019 | Question-answering with BoolQ | The Boolean Questions dataset BoolQ is reported to have a human accuracy on 89.0, while the T5-11B model is reported to reach 91.2 |
2017 | Dota 2 1v1 | OpenAI reported “We’ve created a bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules”, August 2017. |
2017 | Poker (heads-up no-limits Texas Hold’em) | According to Andrew Ng “AI beats top humans”, January 2017. Libratus, a reinforcement learning-based algorithm from Carnegie Mellon University, see Poker pros vs the machines. |
2016 | Lipreading | Lip Reading Sentences in the Wild writes “… we demonstrate lip reading performance that beats a professional lip reader on videos from BBC television.” |
2016 | Conversational speech recognition | Microsoft Research reports past human performance on benchmark datasets in Achieving human parity in conversational speech recognition |
2016 | Geoguessing | Google’s PlaNet: “In total, PlaNet won 28 of the 50 rounds with a median localization error of 1131.7 km, while the median human localization error was 2320.75 km” according to Google Unveils Neural Network with “Superhuman” Ability to Determine the Location of Almost Any Image. In 2023, university students Michal Skreta, Lukas Haas and Silas Alberti reported 44 kilometer median error beating geoguessr expert Rainbot, see world’s best ai vs geoguessr pro video. |
2016 | Go | DeepMind’s AlphaGo beats best European Go player reported in January Mastering the game of Go with deep neural networks and tree search |
2015 | Closed-world image classification | ImageNet classification by Microsoft Research researchers with deep neural network, see Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Already in 2014 Google was close to human performance, see ImageNet Large Scale Visual Recognition Challenge. Human error rate in the ImageNet has been reported to be 5.1%, – and that was Andrej Karpathy, a dedicated human labeler. Microsoft reported in February 2015 4.94%. Google won one of the competitions in 2014 with “GoogLeNet” having a classification error on 6.66%. Baidu reported in January 2015 an error rate on 5.98% in January 2015 and later in February 5.33%. The initial reports were, however, on the ImageNet dataset with a limited number of classes (1000). A straight out-of-the-box application of Keras-distributed ImageNet-based classifiers does not seem to perform on par with humans, see “Washing machine” in Linking ImageNet WordNet Synsets with Wikidata. |
2015 | Atari game playing | Google DeepMind deep neural network with reinforcement learning, see Human-level control through deep reinforcement learning: “We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games”. See also Playing Atari with Deep Reinforcement Learning |
2014 | Personality judgement | According to Computer-based personality judgments are more accurate than those made by humans. The computer used Facebook Likes. |
2014 | Deceptive pain expression detection | See Automatic Decoding of Facial Movements Reveals Deceptive Pain Expressions: “…and after training human observers, we improved accuracy to a modest 55%. However, a computer vision system that automatically measures facial movements and performs pattern recognition on those movements attained 85% accuracy.” |
2013 | Age estimation | Estimation of a person’s age from a photo of the face. Age Estimation from Face Images: Human vs. Machine Performance. A considerable improvement with Winner of the ChaLearn LAP 2015 challenge: DEX: Deep EXpectation of apparent age from a single image |
2013 | Smooth car driving | Google robotic car head Chris Urmson claimed that their self-driving cars “is driving more smoothly and more safely than our trained professional drivers.” For general car driving the Google car may as of 2014 not be better than humans, e.g., because of problems with road obstacles, see Hidden Obstacles for Google’s Self-Driving Cars. |
2011 | Traffic sign reading | Dan Ciresan used a convolutive neural network on the German Traffic Sign Recognition Benchmark to beat the best human. Results are reported in Man vs. Computer: Benchmarking Machine Learning Algorithms for Traffic Sign Recognition. |
2011 | Jeopardy! | In January 2011 the IBM Watson system beat two human contestants in the open-domain question-answering television quiz show. An introduction to the technique in Watson is Introduction to “This is Watson” |
2008 | Poker | Michael Bowling, see the news report Battle of chips: Computer beats human experts at poker. In 2015 heads-up limit hold’em poker was reported to be not just better than humans, but “essentially weakly solved”, see Heads-up limit hold’em poker is solved. |
2007 | Face recognition | See Face Recognition Algorithms Surpass Humans Matching Faces over Changes in Illumination |
2005 | Single character recognition | See Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs) |
1997 | Chess | See Deep Blue versus Garry Kasparov |
1979 | Backgammon | See Backgammon Computer Program Beats World Champion |
Still waiting…
Year | Domain | Description |
---|---|---|
2014 | University entry examination | A Japanese system was reported to score 95 in 2014 for the English section of the entrance exam to the Tokyo University. The average for a prospect student was 93.1. See also, e.g., The Most Uncreative Examinee: A First Step toward Wide Coverage Natural Language Math Problem Solving. |
2020 | Conversation/chatting | Machines can make conversations and might fool humans to think the machine is a human, but they might not yet be better to converse. See, e.g., Bruce Wilcox and A Neural Conversational Model (2015). Meena (2020) described in Towards a Human-like Open-Domain Chatbot achieves 79% “Sensibleness and Specificity Average”-level against human-level on 86%. |
2015 | Music | Most of what I have heard of RNN music is from Bob Sturm. His “Lisl’s Stis” is quite good. It returns only the melody. In 2016 Manuel Araoz showed examples with harmony: Composed by Recurrent Neural Network. These are fairly tedious. |
2016 | Natural speech | Speech samples from DeepMind’s WaveNet are not far from on level on natural speech. |
2017 | Drone flight over fixed course | NASA’s Jet Propulsion Laboratory in Pasadena, California reported world-class drone pilot Ken Loo to win over a AI-controlled drone in November 2017 |
Thanks to Jakob Eg Larsen and Lars Kai Hansen for providing links.
January 7, 2016 at 12:52 am
Really most people would think of computers being smarter than us would be a “End of the World!” scenario. But even if the end of the world came I think the machines would be trying to protect us. After we are sort of like Darth Vader and Luke when Darth Vader said “LUKE I am your father!” And for us it would be “Machines We are your Fathers and Mothers.” Although I am open to more opinions :)
March 11, 2016 at 7:54 am
Hi Finn. Interesting score board! :) Lisl’s Stis was only the beginning:
https://highnoongmt.wordpress.com/2015/08/15/deep-learning-for-assisting-the-process-of-music-composition-part-4/
https://highnoongmt.wordpress.com/2015/12/20/eight-short-outputs-now-on-youtube/
https://highnoongmt.wordpress.com/2015/12/16/tis-the-season-for-some-deep-carols/
May 14, 2018 at 8:20 am
[…] learning only gained prominence in 2012, and only as recently as 2015 outperformed the human benchmark on the ImageNet Challenge for image classification. The advancements that […]
July 19, 2018 at 4:22 pm
AI can now beat us at Dota 2: https://www.engadget.com/2017/08/12/ai-beats-top-dota-2-players/
August 14, 2018 at 8:23 am
Thanks I have updated it.
December 1, 2022 at 8:27 pm
[…] keep track of Status on human vs. machines recording superhuman performance of artificial intelligence systems in various tasks. ChatGPT has […]