Table of Contents

Capabilities of state-of-the-art AI, 2024

This is a list of some noteworthy capabilities of current state-of-the-art AI in various categories. Last updated 1/24/2024

Games

Examples and discussion of Diplomacy gameplay with Cicero

Language

Sample output from GPT-4

Images

GPT-4 output correctly describing an uploaded image
DALL-E 3 output with the prompt A photo of a painted mural on the side of a building depicting an AI strategy researcher getting distracted from his work by extremely interesting artwork on his laptop screen. A thought bubble above his head says 'Wow, generative AI is quickly improving!' For a subscription fee, DALL-E 3 is available to use at chat.openAI.com
This person does not exist. A demo of StyleGAN2 trained on human faces is available at this-person-does-not-exist.com

A movie composed of several individual video clips produced by VideoPoet

Audio

Output from Suno.AI, given the prompt “A soulful R&B song that is self-referentially about how the song is an example of AI-generated audio output on a wiki page about the capabilities of state-of-the-art AI systems”

Robotics

A demo of the robot Atlas performing parkour.

Biology

Mathematics

1)
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv. https://doi.org/10.48550/arXiv.1712.01815
2)
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., Lillicrap, T., & Silver, D. (2019). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. arXiv. https://doi.org/10.1038/s41586-020-03051-4
3)
Wang, Tony Tong, Adam Gleave, Nora Belrose, Tom Tseng, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, and Stuart Russell. “Adversarial Policies Beat Professional-Level Go AIs.” arXiv preprint arXiv:2211.00241 (2022).
4)
Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P., Vitvitskyi, A., Guo, D., & Blundell, C. (2020). Agent57: Outperforming the Atari Human Benchmark. arXiv. https://doi.org/10.48550/arXiv.2003.13350
5)
Ye, W., Liu, S., Kurutach, T., Abbeel, P., & Gao, Y. (2021). Mastering Atari Games with Limited Data. arXiv. https://doi.org/10.48550/arXiv.2111.00210
6)
Facebook, Carnegie Mellon build first AI that beats pros in 6-player poker. Meta AI. (2019, July 11). Retrieved November 22, 2022, from https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-in-6-player-poker
7)
Wiggers, K. (2019, April 13). Openai five defeats professional dota 2 team, twice. VentureBeat. Retrieved November 22, 2022, from https://venturebeat.com/ai/openai-five-defeats-a-team-of-professional-dota-2-players
8)
Wiggers, K. (2019, April 22). OpenAI's Dota 2 bot defeated 99.4% of players in public matches. VentureBeat. Retrieved November 22, 2022, from https://venturebeat.com/ai/openais-dota-2-bot-defeated-99-4-of-players-in-public-matches
9)
Alphastar: Grandmaster level in starcraft II using multi-agent reinforcement learning. DeepMind. (2019, October 30). Retrieved November 22, 2022, from https://www.deepmind.com/blog/alphastar-grandmaster-level-in-starcraft-ii-using-multi-agent-reinforcement-learning
10)
Hafner, D., Pasukonis, J., Ba, J., & Lillicrap, T. (2023). Mastering Diverse Domains through World Models. arXiv. https://doi.org/10.48550/arXiv.2301.04104
11)
Mastering Stratego, the Classic Game of Imperfect Information. DeepMind blog. (2022, December 1). Retrieved December 2, 2022, from https://www.deepmind.com/blog/mastering-stratego-the-classic-game-of-imperfect-information
12)
Cicero. Meta AI. (n.d.). Retrieved November 23, 2022, from https://ai.facebook.com/research/cicero
13) , 19)
GPT-4. (2023, March 14). OpenAI. Retrieved May 26, 2023, from https://openai.com/research/gpt-4
14)
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., . . . Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways. arXiv. https://doi.org/10.48550/arXiv.2204.02311
15)
Caswell, I., & Liang, B. (2020, June 8). Recent advances in Google Translate. Google AI Blog. Retrieved November 22, 2022, from https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html
16)
Elicit FAQ. Elicit.org. (2022, April). Retrieved November 22, 2022, from https://elicit.org/faq
17)
Assael, Y., Sommerschield, T., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., & de Freitas, N. (2022). Restoring and attributing ancient texts using deep neural networks. Nature, 603(7900), 280–283. https://doi.org/10.1038/s41586-022-04448-z
18)
Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, and Raymond Perrault, “The AI Index 2023 Annual Report,” AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2023.
20)
Lu, C., & Tang, X. (2014). Surpassing Human-Level Face Verification Performance on LFW with GaussianFace. arXiv. https://doi.org/10.48550/arXiv.1404.3840
21)
Assael, Y. M., Shillingford, B., Whiteson, S., & de Freitas, N. (2016). LipNet: End-to-End Sentence-level Lipreading. arXiv. https://doi.org/10.48550/arXiv.1611.01599
22)
AI Imaging & Diagnostics. Google Health. (n.d.). Retrieved November 22, 2022, from https://health.google/health-research/imaging-and-diagnostics/
23)
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. arXiv. https://doi.org/10.48550/arXiv.2103.00020
24)
Betker, James, et al. (2023, October). Improving Image Generation with Better Captions. OpenAI. Retrieved October 31, 2023, from https://cdn.openai.com/papers/dall-e-3.pdf
25)
Muse: Text-To-Image Generation via Masked Generative Transformers, https://muse-model.github.io/ . Accessed 9 January 2023.
26)
Perov, I., Gao, D., Chervoniy, N., Liu, K., Marangonda, S., Umé, C., Dpfks, M., Facenheim, C. S., RP, L., Jiang, J., Zhang, S., Wu, P., Zhou, B., & Zhang, W. (2020). DeepFaceLab: Integrated, flexible and extensible face-swapping framework. arXiv. https://doi.org/10.48550/arXiv.2005.05535
27)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2019). Analyzing and Improving the Image Quality of StyleGAN. ArXiv. https://doi.org/10.48550/arXiv.1912.04958
28)
(2023, December 25). Transformers of the handwritten word. https://mbzuai.ac.ae/news/transformers-of-the-handwritten-word/
29)
Kondratyuk, D., Yu, L., Gu, X., Lezama, J., Huang, J., Hornung, R., Adam, H., Akbari, H., Alon, Y., Birodkar, V., Cheng, Y., Chiu, M., Dillon, J., Essa, I., Gupta, A., Hahn, M., Hauth, A., Hendon, D., Martinez, A., . . . Jiang, L. (2023). VideoPoet: A Large Language Model for Zero-Shot Video Generation. ArXiv. /abs/2312.14125
30)
Shen, J., & Pang, R. (2017, December 19). Tacotron 2: Generating human-like speech from text. Google AI Blog. Retrieved November 23, 2022, from https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html
31)
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. Retrieved November 22, 2022, from https://cdn.openai.com/papers/whisper.pdf
32)
AudioLM. Retrieved February 27, 2023, from https://google-research.github.io/seanet/audiolm/examples/
33)
Arik, S. O., Chen, J., Peng, K., Ping, W., & Zhou, Y. (2018). Neural Voice Cloning with a Few Samples. arXiv. https://doi.org/10.48550/arXiv.1802.06006
34)
Koe: Recast. Koe AI. (n.d.). Retrieved November 22, 2022, from https://koe.ai/recast
35)
Suno.ai. Retrieved January 3, 2024, from https://www.suno.ai/
36)
Metz, C., Laffin, B., & Thi, H. D. (2022, November 15). What riding in a self-driving Tesla tells us about the future of autonomy. The New York Times. Retrieved November 22, 2022, from https://www.nytimes.com/interactive/2022/11/14/technology/tesla-self-driving-flaws.html
37)
Edwards, Benj. (2023, August 31). High-speed AI drone beats world-champion racers for the first time. Ars Technica. Retrieved October 31, 2023, from https://arstechnica.com/information-technology/2023/08/high-speed-ai-drone-beats-world-champion-racers-for-the-first-time/
38)
Akkaya, I., Andrychowicz, M., Chociej, M., Litwin, M., McGrew, B., Petron, A., Paino, A., Plappert, M., Powell, G., Ribas, R., Schneider, J., Tezak, N., Tworek, J., Welinder, P., Weng, L., Yuan, Q., Zaremba, W., & Zhang, L. (2019). Solving Rubik's Cube with a Robot Hand. arXiv. https://doi.org/10.48550/arXiv.1910.07113
39)
Gregory, A. (2022, January 26). Robot successfully performs keyhole surgery on pigs without human help. The Guardian. Retrieved November 22, 2022, from https://www.theguardian.com/technology/2022/jan/26/robot-successfully-performs-keyhole-surgery-on-pigs-without-human-help
40)
Atlas™. Boston Dynamics. (n.d.). Retrieved November 22, 2022, from https://www.bostondynamics.com/atlas
41)
Alphafold. DeepMind. (n.d.). Retrieved November 22, 2022, from https://www.deepmind.com/research/highlighted-research/alphafold
42)
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., … Hassabis, D. (2021). Highly accurate protein structure prediction with alphafold. Nature, 596(7873), 583–589. https://doi.org/10.1038/s41586-021-03819-2
43)
Urbina, F., Lentzos, F., Invernizzi, C., & Ekins, S. (2022). Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4(3), 189–191. https://doi.org/10.1038/s42256-022-00465-9
44)
Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding submitted to Anonymous Conference. MinD-Vis. (n.d.). Retrieved November 22, 2022, from https://mind-vis.github.io
45)
Fawzi, A., Balog, M., Huang, A., Hubert, T., Barekatain, M., Novikov, A., J., F., Schrittwieser, J., Swirszcz, G., Silver, D., Hassabis, D., & Kohli, P. (2022). Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930), 47-53. https://doi.org/10.1038/s41586-022-05172-4
46)
Trinh, T. H., Wu, Y., Le, Q. V., He, H., & Luong, T. (2024). Solving olympiad geometry without human demonstrations. Nature, 625(7995), 476-482. https://doi.org/10.1038/s41586-023-06747-5