Status on human vs. machines

Posted on Updated on

Are computers beating humans. In mere simple number crunching yes, but also in more complex tasks.

Year Domain Description
2022 Gran Turismo A research summary reported “a neural-network algorithm — called GT Sophy — that is capable of winning against the best human players of the video game Gran Turismo.”
2019 Question-answering with BoolQ The Boolean Questions dataset BoolQ is reported to have a human accuracy on 89.0, while the T5-11B model is reported to reach 91.2
2017 Dota 2 1v1 OpenAI reported “We’ve created a bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules”, August 2017.
2017 Poker (heads-up no-limits Texas Hold’em) According to Andrew Ng “AI beats top humans”, January 2017.
Libratus, a reinforcement learning-based algorithm from Carnegie Mellon University, see Poker pros vs the machines.
2016 Lipreading Lip Reading Sentences in the Wild writes “… we demonstrate lip reading performance that beats a professional lip reader on videos from BBC television.”
2016 Conversational speech recognition Microsoft Research reports past human performance on benchmark datasets in Achieving human parity in conversational speech recognition
2016 Geoguessing Google’s PlaNet: “In total, PlaNet won 28 of the 50 rounds with a median localization error of 1131.7 km, while the median human localization error was 2320.75 km” according to Google Unveils Neural Network with “Superhuman” Ability to Determine the Location of Almost Any Image. In 2023, university students Michal Skreta, Lukas Haas and Silas Alberti reported 44 kilometer median error beating geoguessr expert Rainbot, see world’s best ai vs geoguessr pro video.
2016 Go DeepMind’s AlphaGo beats best European Go player reported in January Mastering the game of Go with deep neural networks and tree search
2015 Closed-world image classification ImageNet classification by Microsoft Research researchers with deep neural network, see Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Already in 2014 Google was close to human performance, see ImageNet Large Scale Visual Recognition Challenge. Human error rate in the ImageNet has been reported to be 5.1%, – and that was Andrej Karpathy, a dedicated human labeler. Microsoft reported in February 2015 4.94%. Google won one of the competitions in 2014 with “GoogLeNet” having a classification error on 6.66%. Baidu reported in January 2015 an error rate on 5.98% in January 2015 and later in February 5.33%. The initial reports were, however, on the ImageNet dataset with a limited number of classes (1000). A straight out-of-the-box application of Keras-distributed ImageNet-based classifiers does not seem to perform on par with humans, see “Washing machine” in Linking ImageNet WordNet Synsets with Wikidata.
2015 Atari game playing Google DeepMind deep neural network with reinforcement learning, see Human-level control through deep reinforcement learning: “We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games”. See also Playing Atari with Deep Reinforcement Learning
2014 Personality judgement According to Computer-based personality judgments are more accurate than those made by humans. The computer used Facebook Likes.
2014 Deceptive pain expression detection See Automatic Decoding of Facial Movements Reveals Deceptive Pain Expressions: “…and after training human observers, we improved accuracy to a modest 55%. However, a computer vision system that automatically measures facial movements and performs pattern recognition on those movements attained 85% accuracy.”
2013 Age estimation Estimation of a person’s age from a photo of the face. Age Estimation from Face Images: Human vs. Machine Performance. A considerable improvement with Winner of the ChaLearn LAP 2015 challenge: DEX: Deep EXpectation of apparent age from a single image
2013 Smooth car driving Google robotic car head Chris Urmson claimed that their self-driving cars “is driving more smoothly and more safely than our trained professional drivers.” For general car driving the Google car may as of 2014 not be better than humans, e.g., because of problems with road obstacles, see Hidden Obstacles for Google’s Self-Driving Cars.
2011 Traffic sign reading Dan Ciresan used a convolutive neural network on the German Traffic Sign Recognition Benchmark to beat the best human. Results are reported in Man vs. Computer: Benchmarking Machine Learning Algorithms for Traffic Sign Recognition.
2011 Jeopardy! In January 2011 the IBM Watson system beat two human contestants in the open-domain question-answering television quiz show. An introduction to the technique in Watson is Introduction to “This is Watson”
2008 Poker Michael Bowling, see the news report Battle of chips: Computer beats human experts at poker. In 2015 heads-up limit hold’em poker was reported to be not just better than humans, but “essentially weakly solved”, see Heads-up limit hold’em poker is solved.
2007 Face recognition See Face Recognition Algorithms Surpass Humans Matching Faces over Changes in Illumination
2005 Single character recognition See Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs)
1997 Chess See Deep Blue versus Garry Kasparov
1979 Backgammon See Backgammon Computer Program Beats World Champion

Still waiting…

Year Domain Description
2014 University entry examination A Japanese system was reported to score 95 in 2014 for the English section of the entrance exam to the Tokyo University. The average for a prospect student was 93.1. See also, e.g., The Most Uncreative Examinee: A First Step toward Wide Coverage Natural Language Math Problem Solving.
2020 Conversation/chatting Machines can make conversations and might fool humans to think the machine is a human, but they might not yet be better to converse. See, e.g., Bruce Wilcox and A Neural Conversational Model (2015). Meena (2020) described in Towards a Human-like Open-Domain Chatbot achieves 79% “Sensibleness and Specificity Average”-level against human-level on 86%.
2015 Music Most of what I have heard of RNN music is from Bob Sturm.
His “Lisl’s Stis” is quite good. It returns only the melody. In 2016 Manuel Araoz showed examples with harmony: Composed by Recurrent Neural Network. These are fairly tedious.
2016 Natural speech Speech samples from DeepMind’s WaveNet are not far from on level on natural speech.
2017 Drone flight over fixed course NASA’s Jet Propulsion Laboratory in Pasadena, California reported world-class drone pilot Ken Loo to win over a AI-controlled drone in November 2017

Thanks to Jakob Eg Larsen and Lars Kai Hansen for providing links.

6 thoughts on “Status on human vs. machines

    techedblogg said:
    January 7, 2016 at 12:52 am

    Really most people would think of computers being smarter than us would be a “End of the World!” scenario. But even if the end of the world came I think the machines would be trying to protect us. After we are sort of like Darth Vader and Luke when Darth Vader said “LUKE I am your father!” And for us it would be “Machines We are your Fathers and Mothers.” Although I am open to more opinions :)

    […] learning only gained prominence in 2012, and only as recently as 2015 outperformed the human benchmark on the ImageNet Challenge for image classification. The advancements that […]

    Rob Haswell said:
    July 19, 2018 at 4:22 pm

    […] keep track of Status on human vs. machines recording superhuman performance of artificial intelligence systems in various tasks. ChatGPT has […]

Leave a comment