Status on human vs. machines

Posted on March 15, 2015 Updated on September 5, 2023

Are computers beating humans. In mere simple number crunching yes, but also in more complex tasks.

Year	Domain	Description
2022	Gran Turismo	A research summary reported “a neural-network algorithm — called GT Sophy — that is capable of winning against the best human players of the video game Gran Turismo.”
2019	Question-answering with BoolQ	The Boolean Questions dataset BoolQ is reported to have a human accuracy on 89.0, while the T5-11B model is reported to reach 91.2
2017	Dota 2 1v1	OpenAI reported “We’ve created a bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules”, August 2017.
2017	Poker (heads-up no-limits Texas Hold’em)	According to Andrew Ng “AI beats top humans”, January 2017. Libratus, a reinforcement learning-based algorithm from Carnegie Mellon University, see Poker pros vs the machines.
2016	Lipreading	Lip Reading Sentences in the Wild writes “… we demonstrate lip reading performance that beats a professional lip reader on videos from BBC television.”
2016	Conversational speech recognition	Microsoft Research reports past human performance on benchmark datasets in Achieving human parity in conversational speech recognition
2016	Geoguessing	Google’s PlaNet: “In total, PlaNet won 28 of the 50 rounds with a median localization error of 1131.7 km, while the median human localization error was 2320.75 km” according to Google Unveils Neural Network with “Superhuman” Ability to Determine the Location of Almost Any Image. In 2023, university students Michal Skreta, Lukas Haas and Silas Alberti reported 44 kilometer median error beating geoguessr expert Rainbot, see world’s best ai vs geoguessr pro video.
2016	Go	DeepMind’s AlphaGo beats best European Go player reported in January Mastering the game of Go with deep neural networks and tree search
2015	Closed-world image classification	ImageNet classification by Microsoft Research researchers with deep neural network, see Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Already in 2014 Google was close to human performance, see ImageNet Large Scale Visual Recognition Challenge. Human error rate in the ImageNet has been reported to be 5.1%, – and that was Andrej Karpathy, a dedicated human labeler. Microsoft reported in February 2015 4.94%. Google won one of the competitions in 2014 with “GoogLeNet” having a classification error on 6.66%. Baidu reported in January 2015 an error rate on 5.98% in January 2015 and later in February 5.33%. The initial reports were, however, on the ImageNet dataset with a limited number of classes (1000). A straight out-of-the-box application of Keras-distributed ImageNet-based classifiers does not seem to perform on par with humans, see “Washing machine” in Linking ImageNet WordNet Synsets with Wikidata.
2015	Atari game playing	Google DeepMind deep neural network with reinforcement learning, see Human-level control through deep reinforcement learning: “We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games”. See also Playing Atari with Deep Reinforcement Learning
2014	Personality judgement	According to Computer-based personality judgments are more accurate than those made by humans. The computer used Facebook Likes.
2014	Deceptive pain expression detection	See Automatic Decoding of Facial Movements Reveals Deceptive Pain Expressions: “…and after training human observers, we improved accuracy to a modest 55%. However, a computer vision system that automatically measures facial movements and performs pattern recognition on those movements attained 85% accuracy.”
2013	Age estimation	Estimation of a person’s age from a photo of the face. Age Estimation from Face Images: Human vs. Machine Performance. A considerable improvement with Winner of the ChaLearn LAP 2015 challenge: DEX: Deep EXpectation of apparent age from a single image
2013	Smooth car driving	Google robotic car head Chris Urmson claimed that their self-driving cars “is driving more smoothly and more safely than our trained professional drivers.” For general car driving the Google car may as of 2014 not be better than humans, e.g., because of problems with road obstacles, see Hidden Obstacles for Google’s Self-Driving Cars.
2011	Traffic sign reading	Dan Ciresan used a convolutive neural network on the German Traffic Sign Recognition Benchmark to beat the best human. Results are reported in Man vs. Computer: Benchmarking Machine Learning Algorithms for Traffic Sign Recognition.
2011	Jeopardy!	In January 2011 the IBM Watson system beat two human contestants in the open-domain question-answering television quiz show. An introduction to the technique in Watson is Introduction to “This is Watson”
2008	Poker	Michael Bowling, see the news report Battle of chips: Computer beats human experts at poker. In 2015 heads-up limit hold’em poker was reported to be not just better than humans, but “essentially weakly solved”, see Heads-up limit hold’em poker is solved.
2007	Face recognition	See Face Recognition Algorithms Surpass Humans Matching Faces over Changes in Illumination
2005	Single character recognition	See Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs)
1997	Chess	See Deep Blue versus Garry Kasparov
1979	Backgammon	See Backgammon Computer Program Beats World Champion

Still waiting…

Year	Domain	Description
2014	University entry examination	A Japanese system was reported to score 95 in 2014 for the English section of the entrance exam to the Tokyo University. The average for a prospect student was 93.1. See also, e.g., The Most Uncreative Examinee: A First Step toward Wide Coverage Natural Language Math Problem Solving.
2020	Conversation/chatting	Machines can make conversations and might fool humans to think the machine is a human, but they might not yet be better to converse. See, e.g., Bruce Wilcox and A Neural Conversational Model (2015). Meena (2020) described in Towards a Human-like Open-Domain Chatbot achieves 79% “Sensibleness and Specificity Average”-level against human-level on 86%.
2015	Music	Most of what I have heard of RNN music is from Bob Sturm. His “Lisl’s Stis” is quite good. It returns only the melody. In 2016 Manuel Araoz showed examples with harmony: Composed by Recurrent Neural Network. These are fairly tedious.
2016	Natural speech	Speech samples from DeepMind’s WaveNet are not far from on level on natural speech.
2017	Drone flight over fixed course	NASA’s Jet Propulsion Laboratory in Pasadena, California reported world-class drone pilot Ken Loo to win over a AI-controlled drone in November 2017

Thanks to Jakob Eg Larsen and Lars Kai Hansen for providing links.

This entry was posted in science, technical and tagged machine learning, technological singularity.

6 thoughts on “Status on human vs. machines”

techedblogg said:
January 7, 2016 at 12:52 am

Really most people would think of computers being smarter than us would be a “End of the World!” scenario. But even if the end of the world came I think the machines would be trying to protect us. After we are sort of like Darth Vader and Luke when Darth Vader said “LUKE I am your father!” And for us it would be “Machines We are your Fathers and Mothers.” Although I am open to more opinions :)

Bob L. Sturm said:
March 11, 2016 at 7:54 am

Hi Finn. Interesting score board! :) Lisl’s Stis was only the beginning:
https://highnoongmt.wordpress.com/2015/08/15/deep-learning-for-assisting-the-process-of-music-composition-part-4/
https://highnoongmt.wordpress.com/2015/12/20/eight-short-outputs-now-on-youtube/
https://highnoongmt.wordpress.com/2015/12/16/tis-the-season-for-some-deep-carols/

Tesla wants to use AI to create a “manufacturing revolution” – Reacle said:
May 14, 2018 at 8:20 am

[…] learning only gained prominence in 2012, and only as recently as 2015 outperformed the human benchmark on the ImageNet Challenge for image classification. The advancements that […]

Rob Haswell said:
July 19, 2018 at 4:22 pm

AI can now beat us at Dota 2: https://www.engadget.com/2017/08/12/ai-beats-top-dota-2-players/

Finn Årup Nielsen responded:
August 14, 2018 at 8:23 am

Thanks I have updated it.

ChatGPT and the Danish indfødsretsprøve « Finn Årup Nielsen's blog said:
December 1, 2022 at 8:27 pm

[…] keep track of Status on human vs. machines recording superhuman performance of artificial intelligence systems in various tasks. ChatGPT has […]

	Finn Årup Nielsen on Wikidata and ChatGPT integrati…
	derenrich on Wikidata and ChatGPT integrati…
	Finn Årup Nielsen on Wikidata and ChatGPT integrati…
	derenrich on Wikidata and ChatGPT integrati…
	Wikidata and ChatGPT… on Multihub question answering wi…

Finn Årup Nielsen's blog

– research, science, technology, music, personal opinions, etc.