Differences

This shows you the differences between two versions of the page.

--- uncategorized:capabilities_of_sota_ai [2023/04/27 21:35]
harlanstewart
+++ uncategorized:capabilities_of_sota_ai [2024/01/24 22:58] (current)
harlanstewart
@@ Line 1: / Line 1: @@
-====== Capabilities of state-of-the-art AI, 2023 ======
+====== Capabilities of state-of-the-art AI, 2024 ======
-This is a list of some noteworthy capabilities of current state-of-the-art AI in various categories. Last major update 2/27/2023, last updated 4/27/2023
+This is a list of some noteworthy capabilities of current state-of-the-art AI in various categories. Last updated 1/24/2024
 ==== Games ====
@@ Line 16: / Line 16: @@
   * In 2019, AlphaStar reached Grandmaster level in Starcraft, playing with the same constraints as a human player (viewing the world through a camera, restricted clickrate).((Alphastar: Grandmaster level in starcraft II using multi-agent reinforcement learning. DeepMind. (2019, October 30). Retrieved November 22, 2022, from https://www.deepmind.com/blog/alphastar-grandmaster-level-in-starcraft-ii-using-multi-agent-reinforcement-learning ))
   * DreamerV3 is a general algorithm from 2023 that can learn to play a variety of games without human data, and is able to collect diamonds in Minecraft.(( Hafner, D., Pasukonis, J., Ba, J., & Lillicrap, T. (2023). Mastering Diverse Domains through World Models. arXiv. https://doi.org/10.48550/arXiv.2301.04104))
+  * In 2022, DeepNash won 84% of Stratego games against the top expert human players on Gravon games.((Mastering Stratego, the Classic Game of Imperfect Information. DeepMind blog. (2022, December 1). Retrieved December 2, 2022, from https://www.deepmind.com/blog/mastering-stratego-the-classic-game-of-imperfect-information))
   * CICERO, from 2022, can play Diplomacy, a game that involves communicating and coordinating with other players. Cicero ranked in the top 10% of players who had played more than one game on webDiplomacy.net.(( Cicero. Meta AI. (n.d.). Retrieved November 23, 2022, from https://ai.facebook.com/research/cicero))
-  * In 2022, DeepNash won 84% of Stratego games against the top expert human players on Gravon games.((Mastering Stratego, the Classic Game of Imperfect Information. DeepMind blog. (2022, December 1). Retrieved December 2, 2022, from https://www.deepmind.com/blog/mastering-stratego-the-classic-game-of-imperfect-information))
+<HTML>
+<iframe width="560" height="315" src="https://www.youtube.com/embed/kexYmcu1Zro?si=vpmNylHXPmpU2FaU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+</HTML>
+//Examples and discussion of Diplomacy gameplay with Cicero//
 ====Language====
+  * GPT-4, a large language model from 2023, can write poetry, answer questions, reason about the world, have conversations, act out characters, and more.
-  * ChatGPT is a chatbot from 2022, trained on GPT-3.5. It can write poetry, act out characters, answer questions, and more.((Introducing ChatGPT. (n.d.). OpenAI. Retrieved March 8, 2023, from https://openai.com/blog/chatgpt))
+[{{:uncategorized:gpt-4_output.png?600| Sample output from GPT-4}}]
-[{{:uncategorized:ahoyboi_and_mc_feather.jpg?nolink&600|Sample output from ChatGPT. A demo is available at chat.openai.com}}]
-  * LaMDA is a chatbot from 2022 that was evaluated by human crowdworkers to score 92.9% for “sensibleness” (compared to 100% for human crowdoworker-generated dialogue), 79% for “specificity” (compared to 80% for human crowdworker-generated dialogue), and 25.7% for “interestingness” (compared to 19% for human crowdworker-generated dialogue, although the authors note that the human crowdworkers may have not been trying to write interesting dialogue).((Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H. S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., . . . Le, Q. (2022). LaMDA: Language Models for Dialog Applications. arXiv. https://doi.org/10.48550/arXiv.2201.08239
-))
-[{{:uncategorized:lamda_output.png?nolink&400|Sample output from LaMDA. A demo is available at aitestkitchen.withgoogle.com but only for chatting with a tennis ball character about dogs}}]
+  * Large language models such as GPT-4 can also write code. GPT-4 correctly solved programming problems in the HumanEval dataset 67% of the time.
+  * GPT-4 achieved human-level performance on various professional and academic exams, including SATs, AP exams, and the Uniform Bar Exam.
+  * GPT-4 correctly answered 92% of the questions in GMS8K, a dataset of elementary school level math word problems.
+  * Unlike other large language models, GPT-4 can accept text and images as input.((GPT-4. (2023, March 14). OpenAI. Retrieved May 26, 2023, from https://openai.com/research/gpt-4))
   * PaLM, a language model from 2022, surpassed average human performance on BIG-bench, “a collaborative benchmark aimed at producing challenging tasks for large language models.”((Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., . . . Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways. arXiv. https://doi.org/10.48550/arXiv.2204.02311
 ))
-  * PaLM correctly answered 58% of the questions in GMS8K, a dataset of elementary school level math word problems.<sup>15</sup>
-  * PaLM-Coder can write computer code. It successfully repaired 82.1% of the broken code in the DeepFix dataset, successfully translated 55.1% of C++ programs in the Transcoder dataset to Python, and wrote code that correctly solved programming problems in the HumanEval dataset 36% of the time (when given 100 tries for each problem, the model solved them 88.4% of the time).<sup>15</sup>
   * As of 2020, Google Translate supported over 100 languages. When translating from other languages into English, its translations received BLEU scores ranging from around 0.15 to 0.53, depending on the language. BLEU score is based on the similarity of a translation to one created by a human translator, and ranges from 0 to 1, where a score of 1 indicates output identical to a human translator.((Caswell, I., & Liang, B. (2020, June 8). Recent advances in Google Translate. Google AI Blog. Retrieved November 22, 2022, from https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html))
   * Elicit is an AI research assistant from 2022 that, given a research question, can find relevant papers and summarize the findings of the top four papers.(( Elicit FAQ. Elicit.org. (2022, April). Retrieved November 22, 2022, from https://elicit.org/faq
@@ Line 46: / Line 47: @@
 ====Images====
-  * Image classification systems can recognize objects in the ImageNet database as well as humans.<sup>19</sup>((“…The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has been running annually for five years (since 2010) and has become the standard benchmark for large-scale object recognition.”
+  * GPT-4 can recognize and interpret the content of images. ((GPT-4. (2023, March 14). OpenAI. Retrieved May 26, 2023, from https://openai.com/research/gpt-4))
-Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. “ImageNet Large Scale Visual Recognition Challenge.” ArXiv:1409.0575 [Cs], January 29, 2015. http://arxiv.org/abs/1409.0575.))
-  * DenseCap, from 2015, can identify and describe multiple objects within an image.((Johnson, J., & Karpathy, A. (2015). DenseCap: Fully Convolutional Localization Networks for Dense Captioning. arXiv. https://doi.org/10.48550/arXiv.1511.07571 ))
+[{{:uncategorized:screen_shot_2024-01-03_at_5.03.38_pm.png?600|GPT-4 output correctly describing an uploaded image}}]
-[{{:uncategorized:densecap_output.png?nolink&400|Example of DenseCap analyzing and captioning an image}}]
   * Sensetime is a facial recognition system from 2014 that surpassed average human performance in accurately labeling faces in a large dataset of images.((Lu, C., & Tang, X. (2014). Surpassing Human-Level Face Verification Performance on LFW with GaussianFace. arXiv. https://doi.org/10.48550/arXiv.1404.3840
@@ Line 59: / Line 57: @@
   * CLIP, from 2021, can create a text description of an image.((Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. arXiv. https://doi.org/10.48550/arXiv.2103.00020
 ))
-  * DALL-E 2, from 2022, can create images from text descriptions.((OpenAI. (2022, April 14). Dall·E 2. OpenAI. Retrieved November 22, 2022, from https://openai.com/dall-e-2))
+  * DALL-E 3, from 2023, can generate novel images in many styles, given a text description from the user. It can occasionally produce coherent text within images. ((Betker, James, et al. (2023, October). Improving Image Generation with Better Captions. OpenAI. Retrieved October 31, 2023, from https://cdn.openai.com/papers/dall-e-3.pdf))
-[{{:uncategorized:dall_e_2023-03-08_15.08.37_-_van_gogh_painting_of_a_researcher_getting_distracted_from_his_work_by_extremely_interesting_artwork_on_his_laptop_screen_highly_detailed.png?nolink&400|Cherrypicked DALL-E 2 output with the prompt “Van Gogh painting of a researcher getting distracted from his work by extremely interesting artwork on his laptop screen, highly detailed.” DALL-E 2 is available to use at openai.com/dall-e-2}}]
+[{{:uncategorized:dall_e_2023-10-31_14.56.44_-_a_photo_capturing_a_colorful_painted_mural_on_the_side_of_a_brick_building._the_mural_depicts_a_middle-aged_asian_man_with_short_black_hair_wearing_g.png?600|DALL-E 3 output with the prompt "A photo of a painted mural on the side of a building depicting an AI strategy researcher getting distracted from his work by extremely interesting artwork on his laptop screen. A thought bubble above his head says 'Wow, generative AI is quickly improving!'" For a subscription fee, DALL-E 3 is available to use at chat.openAI.com}}]
-  * Muse can also generate images from text descriptions, more efficiently than other models such as DALL-E 2.(( Muse: Text-To-Image Generation via Masked Generative Transformers, https://muse-model.github.io/ . Accessed 9 January 2023. ))
+  * Muse can also generate images from text descriptions, more efficiently than some other models such as DALL-E 2.(( Muse: Text-To-Image Generation via Masked Generative Transformers, https://muse-model.github.io/ . Accessed 9 January 2023. ))
   * DeepFaceLab, from 2020, can swap a face in a video with another person’s face (“deepfakes”).(( Perov, I., Gao, D., Chervoniy, N., Liu, K., Marangonda, S., Umé, C., Dpfks, M., Facenheim, C. S., RP, L., Jiang, J., Zhang, S., Wu, P., Zhou, B., & Zhang, W. (2020). DeepFaceLab: Integrated, flexible and extensible face-swapping framework. arXiv. https://doi.org/10.48550/arXiv.2005.05535))
   * Generative adversarial networks such as StyleGAN2, from 2019, can be trained to create realistic images of something within a certain category, such as human faces.((Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2019). Analyzing and Improving the Image Quality of StyleGAN. ArXiv. https://doi.org/10.48550/arXiv.1912.04958))
@@ Line 69: / Line 67: @@
 [{{:uncategorized:thispersondoesnotexist.png?nolink&400|This person does not exist. A demo of StyleGAN2 trained on human faces is available at this-person-does-not-exist.com}}]
-  * PaLI, released in 2022, can answer questions about images, caption images, detect objects in images, and classify images.((Chen, X., & Wang, X. (2022, September 15). PaLI: Scaling Language-Image Learning in 100+ Languages – Google AI Blog. Google AI Blog. Retrieved April 27, 2023, from https://ai.googleblog.com/2022/09/pali-scaling-language-image-learning-in.html))
+  * An AI system from 2023 can convincingly copy someone's handwriting after seeing only a few paragraphs of example text.(((2023, December 25). Transformers of the handwritten word. https://mbzuai.ac.ae/news/transformers-of-the-handwritten-word/))
-  * Make-a-Video, released in 2022, can generate video from a text prompt. ((Singer, U., Polyak, A., Hayes, T., Yin, X., An, J., Zhang, S., Hu, Q., Yang, H., Ashual, O., Gafni, O., Parikh, D., Gupta, S., & Taigman, Y. (2022). Make-A-Video: Text-to-Video Generation without Text-Video Data. ArXiv. /abs/2209.14792))
+  * AI systems such as VideoPoet, from 2023, can generate short videos given a text description.((Kondratyuk, D., Yu, L., Gu, X., Lezama, J., Huang, J., Hornung, R., Adam, H., Akbari, H., Alon, Y., Birodkar, V., Cheng, Y., Chiu, M., Dillon, J., Essa, I., Gupta, A., Hahn, M., Hauth, A., Hendon, D., Martinez, A., . . .  Jiang, L. (2023). VideoPoet: A Large Language Model for Zero-Shot Video Generation. ArXiv. /abs/2312.14125))
+<HTML>
+<iframe width="560" height="315" src="https://www.youtube.com/embed/70wZKfx6Ylk?si=FX-xbbwnA1wDa9A6" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+</HTML>
+//A movie composed of several individual video clips produced by VideoPoet//
 ====Audio====
@@ Line 78: / Line 81: @@
   * Automatic speech recognition systems can transcribe recordings of human speech. Whisper, from 2022, is able to transcribe recordings with an accuracy close to that of professional human transcribers.((Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. Retrieved November 22, 2022, from https://cdn.openai.com/papers/whisper.pdf
 ))
-  * Jukebox, from 2020, can generate samples of music with a provided genre, artist, and lyrics as input.((Dhariwal, P., Jun, H., Payne, C., Kim, J. W., Radford, A., & Sutskever, I. (2020). Jukebox: A Generative Model for Music. arXiv. https://doi.org/10.48550/arXiv.2005.00341))
   * AudioLM, from 2022, creates predicted “continuations” of an audio input.((AudioLM. Retrieved February 27, 2023, from https://google-research.github.io/seanet/audiolm/examples/))
-  * MusicLM, from 2022, creates samples of music based on a text caption.((MusicLM: Generating Music From Text. Retrieved February 27, 2023, from https://google-research.github.io/seanet/musiclm/examples/ ))
   * Models such as Deep Voice 3, from 2018, can imitate a human voice based on a few samples of recorded speech.((Arik, S. O., Chen, J., Peng, K., Ping, W., & Zhou, Y. (2018). Neural Voice Cloning with a Few Samples. arXiv. https://doi.org/10.48550/arXiv.1802.06006))
   * Recent models such as Koe can take a recorded voice sample and change it into another voice.((Koe: Recast. Koe AI. (n.d.). Retrieved November 22, 2022, from https://koe.ai/recast))
+  * Suno.ai, from 2023, can create songs with lyrics and instrumentation based on a text description of the song's style and subject. ((Suno.ai. Retrieved January 3, 2024, from https://www.suno.ai/))
+{{ :uncategorized:artificial_love.mp4 |}}
+//Output from Suno.AI, given the prompt "A soulful R&B song that is self-referentially about how the song is an example of AI-generated audio output on a wiki page about the capabilities of state-of-the-art AI systems"//
 ====Robotics====
   * Although they are prone to occasional mistakes, self-driving cars are able to drive with human supervision.((Metz, C., Laffin, B., & Thi, H. D. (2022, November 15). What riding in a self-driving Tesla tells us about the future of autonomy. The New York Times. Retrieved November 22, 2022, from https://www.nytimes.com/interactive/2022/11/14/technology/tesla-self-driving-flaws.html ))
-  * In 2021, an AI-piloted drone won a race against drones piloted by human experts.((Hambling, D. (2021, July 23). An AI-controlled drone racer has beaten human pilots for the first time. Forbes. Retrieved November 22, 2022, from https://www.forbes.com/sites/davidhambling/2021/07/23/swiss-ai-drone-racer-is-faster-than-human-pilots/?sh=48366e011ea0))
+  * In 2022, an AI-piloted drone won multiple races against three world-champion human drone pilots. ((Edwards, Benj. (2023, August 31). High-speed AI drone beats world-champion racers for the first time. Ars Technica. Retrieved October 31, 2023, from https://arstechnica.com/information-technology/2023/08/high-speed-ai-drone-beats-world-champion-racers-for-the-first-time/))
-  * Atlas, a humanoid robot, can walk, run, and perform parkour moves such as backflips.((Atlas™. Boston Dynamics. (n.d.). Retrieved November 22, 2022, from https://www.bostondynamics.com/atlas))
   * A robot made by OpenAI in 2019 can solve a rubik’s cube with one human-like hand.((Akkaya, I., Andrychowicz, M., Chociej, M., Litwin, M., McGrew, B., Petron, A., Paino, A., Plappert, M., Powell, G., Ribas, R., Schneider, J., Tezak, N., Tworek, J., Welinder, P., Weng, L., Yuan, Q., Zaremba, W., & Zhang, L. (2019). Solving Rubik's Cube with a Robot Hand. arXiv. https://doi.org/10.48550/arXiv.1910.07113))
   * In 2022, a robot successfully performed laparoscopic surgery on four pigs, without human assistance.((Gregory, A. (2022, January 26). Robot successfully performs keyhole surgery on pigs without human help. The Guardian. Retrieved November 22, 2022, from https://www.theguardian.com/technology/2022/jan/26/robot-successfully-performs-keyhole-surgery-on-pigs-without-human-help))
+  * Atlas, a humanoid robot, can walk, run, and perform parkour moves such as backflips.((Atlas™. Boston Dynamics. (n.d.). Retrieved November 22, 2022, from https://www.bostondynamics.com/atlas))
+<HTML>
+<iframe width="560" height="315" src="https://www.youtube.com/embed/tF4DML7FIWk?si=AjElyGK1pbjBfoGT" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+</HTML>
+//A demo of the robot Atlas performing parkour.//
 ====Biology====
@@ Line 97: / Line 107: @@
   * In 2022 a model was able to predict the effect of a molecule on levels of an enzyme in humans and find molecules that inhibit a particular enzyme.((Urbina, F., Lentzos, F., Invernizzi, C., & Ekins, S. (2022). Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4(3), 189–191. https://doi.org/10.1038/s42256-022-00465-9))
   * MinD-Vis, from 2022, can decode a subject’s brain activity to reconstruct an image that has some of the details and features of the image the subject is looking at.((Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding submitted to Anonymous Conference. MinD-Vis. (n.d.). Retrieved November 22, 2022, from https://mind-vis.github.io))
+====Mathematics====
+  * In 2022, AlphaTensor discovered efficient new algorithms for matrix multiplication, including an algorithm that broke a 50-year record in efficiency for 4x4 matrices in a finite field.((Fawzi, A., Balog, M., Huang, A., Hubert, T., Barekatain, M., Novikov, A., J., F., Schrittwieser, J., Swirszcz, G., Silver, D., Hassabis, D., & Kohli, P. (2022). Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930), 47-53. https://doi.org/10.1038/s41586-022-05172-4))
+  * In 2024, AlphaGeometry solved 25 out of 30 Olympiad-level geometry problems, approaching the level of an Olympiad gold medalist.((Trinh, T. H., Wu, Y., Le, Q. V., He, H., & Luong, T. (2024). Solving olympiad geometry without human demonstrations. Nature, 625(7995), 476-482. https://doi.org/10.1038/s41586-023-06747-5))

AI Impacts Wiki

User Tools

Site Tools

Differences

Page Tools