Humanity's Last Exam is still accepting questions from late contributors and submissions for the dataset and co-authorship, but new submissions are not eligible for the prize pool.
New Submission(for new contributors)
Sign In Dashboard(for current contributors)
Current Contributors
HLS Logo

Humanity's Last Exam

Hugging FaceDatasetload_dataset("cais/hle")
CAIS Logo&Scale AI Logo

Long Phan*1, Alice Gatti*1, Ziwen Han*2, Nathaniel Li*1

Josephina Hu2, Hugh Zhang, Sean Shi2, Michael Choi2, Anish Agrawal2, Arnav Chopra2

Adam Khoja1, Ryan Kim, Richard Ren1, Jason Hausenloy1, Oliver Zhang1, Mantas Mazeika1

Summer Yue**2, Alexandr Wang**2, Dan Hendrycks**1

1Center for AI Safety, 2Scale AI

Authors

Daron Anderson, Tung Nguyen, Mobeen Mahmood, Fiona Feng, Steven Y. Feng, Haoran Zhao, Michael Yu, Varun Gangal, Chelsea Zou, Zihan Wang, Jessica P. Wang, Pawan Kumar, Oleksandr Pokutnyi, Robert Gerbicz, Serguei Popov, John-Clark Levin, Mstyslav Kazakov, Johannes Schmitt, Geoff Galgon, Alvaro Sanchez, Yongki Lee, Will Yeadon, Scott Sauers, Marc Roth, Chidozie Agu, Søren Riis, Fabian Giska, Saiteja Utpala, Zachary Giboney, Gashaw M. Goshu, Joan of Arc Xavier, Sarah-Jane Crowson, Mohinder Maheshbhai Naiya, Noah Burns, Lennart Finke, Zerui Cheng, Hyunwoo Park, Francesco Fournier-Facio, John Wydallis, Mark Nandor, Ankit Singh, Tim Gehrunger, Jiaqi Cai, Ben McCarty, Darling Duclosel, Jungbae Nam, Jennifer Zampese, Ryan G. Hoerr, Aras Bacho, Gautier Abou Loume, Abdallah Galal, Hangrui Cao, Alexis C Garretson, Damien Sileo, Qiuyu Ren, Doru Cojoc, Pavel Arkhipov, Usman Qazi, Lianghui Li, Sumeet Motwani, Christian Schroeder de Witt, Edwin Taylor, Johannes Veith, Eric Singer, Taylor D. Hartman, Paolo Rissone, Jaehyeok Jin, Jack Wei Lun Shi, Chris G. Willcocks, Joshua Robinson, Aleksandar Mikov, Ameya Prabhu, Longke Tang, Xavier Alapont, Justine Leon Uro, Kevin Zhou, Emily de Oliveira Santos, Andrey Pupasov Maksimov, Edward Vendrow, Kengo Zenitani, Julien Guillod, Yuqi Li, Joshua Vendrow, Vladyslav Kuchkin, Ng Ze-An, Pierre Marion, Denis Efremov, Jayson Lynch, Kaiqu Liang, Andrew Gritsevskiy, Dakotah Martinez, Ben Pageler, Nick Crispino, Dimitri Zvonkine, Natanael Wildner Fraga, Saeed Soori, Ori Press, Henry Tang, Julian Salazar, Sean R. Green, Lina Brüssel, Moon Twayana, Aymeric Dieuleveut, T. Ryan Rogers, Wenjin Zhang, Bikun Li, Jinzhou Yang, Arun Rao, Gabriel Loiseau, Mikhail Kalinin, Marco Lukas, Ciprian Manolescu, Subrata Mishra, Ariel Ghislain Kemogne Kamdoum, Tobias Kreiman, Tad Hogg, Alvin Jin, Carlo Bosio, Gongbo Sun, Brian P Coppola, Tim Tarver, Haline Heidinger, Rafael Sayous, Stefan Ivanov, Joseph M Cavanagh, Jiawei Shen, Joseph Marvin Imperial, Philippe Schwaller, Shaipranesh Senthilkuma, Andres M Bran, Ali Dehghan, Andres Algaba, Brecht Verbeken, David Noever, Ragavendran P V, Lisa Schut, Ilia Sucholutsky, Evgenii Zheltonozhskii, Derek Lim, Richard Stanley, Shankar Sivarajan, Tong Yang, John Maar, Julian Wykowski, Martí Oller, Jennifer Sandlin, Anmol Sahu, Yuzheng Hu, Sara Fish, Nasser Heydari, Archimedes Apronti, Kaivalya Rawal, Tobias Garcia Vilchis, Yuexuan Zu, Martin Lackner, James Koppel, Jeremy Nguyen, Daniil S. Antonenko, Steffi Chern, Bingchen Zhao, Pierrot Arsene, Alan Goldfarb, Sergey Ivanov, Rafał Poświata, Chenguang Wang, Daofeng Li, Donato Crisostomi, Andrea Achilleos, Benjamin Myklebust, Archan Sen, David Perrella, Nurdin Kaparov, Mark H Inlow, Allen Zang, Elliott Thornley, Daniil Orel, Vladislav Poritski, Shalev Ben-David, Zachary Berger, Parker Whitfill, Michael Foster, Daniel Munro, Linh Ho, Dan Bar Hava, Aleksey Kuchkin, Robert Lauff, David Holmes, Frank Sommerhage, Keith Schneider, Zakayo Kazibwe, Nate Stambaugh, Mukhwinder Singh, Ilias Magoulas, Don Clarke, Dae Hyun Kim, Felipe Meneguitti Dias, Veit Elser, Kanu Priya Agarwal, Victor Efren Guadarrama Vilchis, Immo Klose, Christoph Demian, Ujjwala Anantheswaran, Adam Zweiger, Guglielmo Albani, Jeffery Li, Nicolas Daans, Maksim Radionov, Václav Rozhoň, Ziqiao Ma, Christian Stump, Mohammed Berkani, Jacob Platnick, Volodymyr Nevirkovets, Luke Basler, Marco Piccardo, Ferenc Jeanplong, Niv Cohen, Josef Tkadlec, Paul Rosu, Piotr Padlewski, Stanislaw Barzowski, Kyle Montgomery, Aline Menezes, Arkil Patel, Zixuan Wang, Jamie Tucker-Foltz, Jack Stade, Tom Goertzen, Fereshteh Kazemi, Jeremiah Milbauer, John Arnold Ambay, Abhishek Shukla, Yan Carlos Leyva Labrador, Alan Givré, Hew Wolff, Vivien Rossbach, Muhammad Fayez Aziz, Younesse Kaddar, Yanxu Chen, Robin Zhang, Jiayi Pan, Antonio Terpin, Niklas Muennighoff, Hailey Schoelkopf, Eric Zheng, Avishy Carmi, Adam Jones, Jainam Shah, Ethan D. L. Brown, Kelin Zhu, Max Bartolo, Richard Wheeler, Andrew Ho, Shaul Barkan, Jiaqi Wang, Martin Stehberger, Egor Kretov, Kaustubh Sridhar, Zienab EL-Wasif, Anji Zhang, Daniel Pyda, Joanna Tam, David M. Cunningham, Vladimir Goryachev, Demosthenes Patramanis, Michael Krause, Andrew Redenti, Daniel Bugas, David Aldous, Jesyin Lai, Shannon Coleman, Mohsen Bahaloo, Jiangnan Xu, Sangwon Lee, Sandy Zhao, Ning Tang, Michael K. Cohen, Micah Carroll, Orr Paradise, Jan Hendrik Kirchner, Stefan Steinerberger, Maksym Ovchynnikov, Jason O. Matos, Adithya Shenoy, Benedito Alves de Oliveira Junior, Michael Wang, Yuzhou Nie, Paolo Giordano, Philipp Petersen, Anna Sztyber-Betley, Priti Shukla, Jonathan Crozier, Antonella Pinto, Shreyas Verma, Prashant Joshi, Zheng-Xin Yong, Allison Tee, Jérémy Andréoletti, Orion Weller, Raghav Singhal, Gang Zhang, Alexander Ivanov, Seri Khoury, Hamid Mostaghimi, Kunvar Thaman, Qijia Chen, Trần Quốc Khánh, Jacob Loader, Stefano Cavalleri, Hannah Szlyk, Zachary Brown, Jonathan Roberts, William Alley, Kunyang Sun, Ryan Stendall, Max Lamparth, Anka Reuel, Ting Wang, Hanmeng Xu, Sreenivas Goud Raparthi, Pablo Hernández-Cámara, Freddie Martin, Dmitry Malishev, Thomas Preu, Tomek Korbak, Marcus Abramovitch, Dominic Williamson, Ziye Chen, Biró Bálint, M Saiful Bari, Peyman Kassani, Zihao Wang, Behzad Ansarinejad, Laxman Prasad Goswami, Yewen Sun, Hossam Elgnainy, Daniel Tordera, George Balabanian, Earth Anderson, Lynna Kvistad, Alejandro José Moyano, Rajat Maheshwari, Ahmad Sakor, Murat Eron, Isaac C. McAlister, Javier Gimenez, Innocent Enyekwe, Andrew Favre D.O., Shailesh Shah, Xiaoxiang Zhou, Firuz Kamalov, Ronald Clark, Sherwin Abdoli, Tim Santens, Khalida Meer, Harrison K Wang, Kalyan Ramakrishnan, Evan Chen, Alessandro Tomasiello, G. Bruno De Luca, Shi-Zhuo Looi, Vinh-Kha Le, Noam Kolt, Niels Mündler, Avi Semler, Emma Rodman, Jacob Drori, Carl J Fossum, Milind Jagota, Ronak Pradeep, Honglu Fan, Tej Shah, Jonathan Eicher, Michael Chen, Kushal Thaman, William Merrill, Carter Harris, Jason Gross, Ilya Gusev, Asankhaya Sharma, Shashank Agnihotri, Pavel Zhelnov, Siranut Usawasutsakorn, Mohammadreza Mofayezi, Sergei Bogdanov, Alexander Piperski, Marc Carauleanu, David K. Zhang, Dylan Ler, Roman Leventov, Ignat Soroko, Thorben Jansen, Pascal Lauer, Joshua Duersch, Vage Taamazyan, Wiktor Morak, Wenjie Ma, William Held, Tran Đuc Huy, Ruicheng Xian, Armel Randy Zebaze, Mohanad Mohamed, Julian Noah Leser, Michelle X Yuan, Laila Yacar, Johannes Lengler, Hossein Shahrtash, Edson Oliveira, Joseph W. Jackson, Daniel Espinosa Gonzalez, Andy Zou, Muthu Chidambaram, Timothy Manik, Hector Haffenden, Dashiell Stander, Ali Dasouqi, Alexander Shen, Emilien Duc, Bita Golshani, David Stap, Mikalai Uzhou, Alina Borisovna Zhidkovskaya, Lukas Lewark, Mátyás Vincze, Dustin Wehr, Colin Tang, Zaki Hossain, Shaun Phillips, Jiang Muzhen, Fredrik Ekström, Angela Hammon, Oam Patel, Nicolas Remy, Faraz Farhidi, George Medley, Forough Mohammadzadeh, Madellene Peñaflor, Haile Kassahun, Alena Friedrich, Claire Sparrow, Taom Sakal, Omkar Dhamane, Ali Khajegili Mirabadi, Eric Hallman, Mike Battaglia, Mohammad Maghsoudimehrabani, Hieu Hoang, Alon Amit, Dave Hulbert, Roberto Pereira, Simon Weber, Stephen Mensah, Nathan Andre, Anton Peristyy, Chris Harjadi, Himanshu Gupta, Stephen Malina, Samuel Albanie, Will Cai, Mustafa Mehkary, Frank Reidegeld, Anna-Katharina Dick, Cary Friday, Jasdeep Sidhu, Wanyoung Kim, Mariana Costa, Hubeyb Gurdogan, Brian Weber, Harsh Kumar, Tong Jiang, Arunim Agarwal, Chiara Ceconello, Warren S. Vaz, Chao Zhuang, Haon Park, Andrew R. Tawfeek, Daattavya Aggarwal, Michael Kirchhof, Linjie Dai, Evan Kim, Johan Ferret, Yuzhou Wang, Minghao Yan, Krzysztof Burdzy, Lixin Zhang, Antonio Franca, Diana T. Pham, Kang Yong Loh, Joshua Robinson, Shreen Gul, Gunjan Chhablani, Zhehang Du, Adrian Cosma, Colin White, Robin Riblet, Prajvi Saxena, Jacob Votava, Vladimir Vinnikov, Ethan Delaney, Shiv Halasyamani, Syed M. Shahid, Jean-Christophe Mourrat, Lavr Vetoshkin, Renas Bacho, Vincent Ginis, Aleksandr Maksapetyan, Florencia de la Rosa, Xiuyu Li, Guillaume Malod, Leon Lang, Julien Laurendeau, Fatimah Adesanya, Julien Portier, Lawrence Hollom, Victor Souza, Yuchen Anna Zhou, Yiğit Yalın, Gbenga Daniel Obikoya, Luca Arnaboldi, Rai (Michael Pokorny), Filippo Bigi, Kaniuar Bacho, Pierre Clavier, Gabriel Recchia, Mara Popescu, Nikita Shulga, Ngefor Mildred Tanwie, Thomas C.H. Lux, Ben Rank, Colin Ni, Alesia Yakimchyk, Huanxu (Quinn) Liu, Olle Häggström, Emil Verkama, Himanshu Narayan, Hans Gundlach, Leonor Brito-Santana, Brian Amaro, Vivek Vajipey, Rynaa Grover, Yiyang Fan, Gabriel Poesia Reis e Silva, Linwei Xin, Yosi Kratish, Jakub Łucki, Wen-Ding Li, Justin Xu, Kevin Joseph Scaria, Freddie Vargus, Farzad Habibi, Long (Tony) Lian, Emanuele Rodolà, Jules Robins, Vincent Cheng, Declan Grabb, Ida Bosio, Tony Fruhauff, Ido Akov, Eve J. Y. Lo, Hao Qi, Xi Jiang, Ben Segev, Jingxuan Fan, Sarah Martinson, Erik Y. Wang, Kaylie Hausknecht, Michael P. Brenner, Mao Mao, Yibo Jiang, Xinyu Zhang, David Avagian, Eshawn Jessica Scipio, Muhammad Rehan Siddiqi, Alon Ragoler, Justin Tan, Deepakkumar Patil, Rebeka Plecnik, Aaron Kirtland, Roselynn Grace Montecillo, Stephane Durand, Omer Faruk Bodur, Zahra Adoul, Mohamed Zekry, Guillaume Douville, Ali Karakoc, Tania C. B. Santos, Samir Shamseldeen, Loukmane Karim, Anna Liakhovitskaia, Nate Resman, Nicholas Farina, Juan Carlos Gonzalez, Gabe Maayan, Sarah Hoback, Rodrigo De Oliveira Pena, Glen Sherman, Hodjat Mariji, Rasoul Pouriamanesh, Wentao Wu, Gözdenur Demir, Sandra Mendoza, Ismail Alarab, Joshua Cole, Danyelle Ferreira, Bryan Johnson, Hsiaoyun Milliron, Mohammad Safdari, Liangti Dai, Siriphan Arthornthurasuk, Alexey Pronin, Jing Fan, Angel Ramirez-Trinidad, Ashley Cartwright, Daphiny Pottmaier, Omid Taheri, David Outevsky, Stanley Stepanic, Samuel Perry, Luke Askew, Raúl Adrián Huerta Rodríguez, Abdelkader Dendane, Sam Ali, Ricardo Lorena, Krishnamurthy Iyer, Sk Md Salauddin, Murat Islam, Juan Gonzalez, Josh Ducey, Russell Campbell, Maja Somrak, Vasilios Mavroudis, Eric Vergo, Juehang Qin, Benjámin Borbás, Eric Chu, Jack Lindsey, Anil Radhakrishnan, Antoine Jallon, I.M.J. McInnis, Alex Hoover, Sören Möller, Song Bian, John Lai, Tejal Patwardhan

Affiliations

3Independent Researcher, 4Texas A&M University, 5McGill University, 6Queen's University, 7Stanford University, 8University of Washington, 9University of California, San Diego, 10RWTH Aachen University, 11Pondicherry Engineering College, 12Institute of Mathematics of NAS of Ukraine, 13ELTE, 14University of Porto, 15University of Cambridge, 16Kyiv Polytechnic Institute, 17ETH Zürich, 18Nimbus AI, 19Georgia Southern University, 20Durham University, 21University of Minnesota Twin Cities, 22Queen Mary University of London, 23Alberta Health Services, 24Microsoft Research, 25ZG Law, 26Outlier, 27Hereford College of Arts, 28Auckland University of Technology, 29Princeton University, 30Carnegie Mellon University, 31Hemwati Nandan Bahuguna Garhwal University, 32Massachusetts Institute of Technology, 33Accenture Labs, 34Escuela Superior de Medicina- Instituto Politécnico Nacional, 35CICMA, 36University of Canterbury, 37Metropolitan State University of Denver, 38California Institute of Technology, 39Université de Yaoundé I, 40Ecole Nationale Supérieure Polytechnique de Yaoundé, 41Tanta University, 42Tufts University, 43The Jackson Laboratory, 44Inria, 45University of California, Berkeley, 46Columbia University, 47Institute of Science and Technology Austria, 48RUSM, 49University of British Columbia, 50École Polytechnique Fédérale de Lausanne, 51University of Oxford, 52Charité – Universitätsmedizin, 53Humboldt-Universität zu Berlin, 54Happy Technologies LLC, 55Northern Illinois University, 56Sapienza University of Rome, 57National University of Singapore, 58University of Southern California, 59University of Tübingen, 60University of Sao Paulo, 61Universidade Federal de Juiz de Fora, 62Sorbonne Université, 63École Normale Supérieure, 64C. N. Yang institute for Theoretical Physics, 65University of Luxembourg, 66University of Malaya, 67Rockwell Automation, 68Contramont Research, 69Washington University, 70CNRS, 71Université Paris-Saclay, 72University of Toronto, 73Google DeepMind, 74University of North Texas, 75Institut Polytechnique de Paris, 76TRR Designs, 77University of Chicago, 78Maastricht University, 79University of California, Los Angeles, 80Martin-Luther-University Halle-Wittenberg, 81Leibniz University Hannover, 82Indian Institute of Technology Bombay, 83University of Calgary, 84Institute for Molecular Manufacturing, 85University of Wisconsin-Madison, 86University of Michigan, 87Bethune-Cookman University, 88St. Petersburg College, 89La Molina National Agrarian University, 90University of Bath, 91National University Philippines, 92Vrije Universiteit Brussel, 93PeopleTec, Inc., 94New York University, 95Technion – Israel Institute of Technology, 96University of Miami, 97University of Maryland, 98Technische Universität Berlin, 99Arizona State University, 100University of Illinois Urbana-Champaign, 101Harvard University, 102Royal Holloway, University of London, 103Universidad Iberoamericana, 104TU Wien, 105Swinburne University of Technology, 106Yale University, 107University of Edinburgh, 108École Normale Supérieure Paris-Saclay, 109National Information Processing Institute, 110University College London, 111Ecco IT, 112University of Western Australia, 113Snorkel AI, 114Indiana State University, 115Oxford University, 116Mohamed bin Zayed University of Artificial Intelligence, 117University of Waterloo, 118Manhattan School of Music, 119Universiteit Leiden, 120Synbionix, 121Corteva Agriscience, 122Diverging Mathematics, 123Saint Mary's University, 124Emory University, 125Sanford Burnham Preybs, 126Yonsei University, 127Cornell University, 128University of Leeds, 129Politecnico di Milano, 130KU Leuven, 131Brandenburg University of Technology, 132INSAIT, 133Ruhr University Bochum, 134University Mohammed I, 135Georgia Institute of Technology, 136Northwestern University, 137University of Arizona, 138Universidade de Lisboa,, 139Mānuka Honey and Beekeeping Consultancy Ltd, 140Charles University, 141Duke University, 142Mila, 143University of Copenhagen, 144The University of Sydney, 145University of Technology Sydney, 146Indian Institute of Technology Delhi, 147University of Buenos Aires, 148University of Amsterdam, 149Ben-Gurion University, 150blurrylogic, 151Donald and Barbara Zucker School of Medicine, 152Cohere, 153Ivy Natal, 154Hebrew University, 155Fraunhofer IMTE, 156University of Pennsylvania, 157National Institute of Laser Enhanced Sciences, 158Drexel University, 159Northeastern University, 160EHC Investments LLC, 161University of Windsor, 162St. Jude Children’s Research Hospital, 163GC, 164Rochester Institute of Technology, 165Anthropic, 166CERN, 167University of California, Santa Barbara, 168University of Vienna, 169Warsaw University of Technology, 170EF Polymers Pvt Ltd, 171North Carolina State University, 172Independent researcher, 173Simplr AI, Asurion, 174All India Institute of Medical Sciences, 175Brown University, 176Johns Hopkins University, 177Ruhr-Universität Bochum, 178Standard Intelligence, 179Posts and Telecommunications Institute of Technology, 180Clearhorse Ltd, 181Cranfield University, 182JNTU, 183Image Processing Lab, Universitat de Valencia, 184Universität Zürich, 185UK AI Safety Institute, 186Boston University, 187SDAIA, 188Children’s Hospital of Orange County, 189The Ohio State University, 190Cairo University Specialized Pediatric Hospital, 191Universidad de Valencia, 192University of Arkansas, 193Monash University, 194OncoPrecision, 195Genomia Diagnostics Research Pvt Ltd, 196IEEE Life Member, 197Larkin Community Hospital, 198The University of Texas at Dallas, 199Canadian University Dubai, 200Università di Milano-Bicocca, 201University of Massachusetts Lowell, 202Virginia Tech, 203University of Geneva, 204Rutgers University, 205MolMind, 206Cal Poly San Luis Obispo, 207Patched Codes, Inc, 208University of Mannheim, 209Chulalongkorn University, 210Ecole polytechnique, 211Stockholm University, 212AE Studio, 213Gaia Lab, 214Leibniz Institute for Science and Mathematics Education, 215Australian National University, 216Saarland University, 217College of Eastern Idaho, 218Intrinsic Innovation LLC, 219HUTECH, 220INRIA, 221King Saud University, 222Universidad de Buenos Aires, 223Pennsylvania College of Technology, 224CERo Therapeutics Holdings, Inc., 225The Univeirsty of Tennessee, 226Gray Swan AI, 227EleutherAI, 228University of Montpellier, 229HomeEquity Bank, 230Materials Platform for Data Science LLC, 231University of Trento, 232Fondazione Bruno Kessler, 233Cambridge University, 234LGM, 235Georgia State University, 236Polytechnic University of the Philippines, 237University of Oregon, 238University of Mumbai, 239University of Guelph, 240Case Wester Reserve University, 241Intuit, 242CTTC / CERCA, 243National University, 244Talishar, 245Dyno Therapeutics, 246The Hospital for Sick Children, 247Lewis Katz School of Medicine, 248Fyaora Labs, 249Intelligent Geometries, 250Indian Institute of Technology (BHU), 251Center for AI Safety, 252AIM Intelligence, 253Seoul National University, 254The University of Texas at Arlington, 255The Hartree Centre, 256Missouri University of Science and Technology, 257POLITEHNICA Bucharest National University of Science and Technology, 258Abacus.AI, 259German Research Center for Artificial Intelligence, 260University of Galway, 261University of Houston, 262Eastern Institute of Technology (EIT), 263ENS Lyon, 264Czech Technical University in Prague, 265CISPA Helmholtz Center for Information Security, 266Universidad de Morón, 267Université Paris Cité and Sorbonne Université, 268Sheffield Hallam University, 269The New School, 270Max Planck Institute for Software Systems, 271OpenAI, 272École Polytechnique, 273Modulo Research, 274Heidelberg University, 275La Trobe University, 276University of Yaoundé I, 277Lux Labs, 278University of Innsbruck, 279Nabu Technologies Inc, 280Chalmers University of Technology, 281KTH Royal Institute of Technology, 282Unidade Local de Saúde de Lisboa Ocidental, 283Quotient AI, 284University of California, Irvine, 285University of Padua, 286Aalto University, 287Royal Veterinary College, 288The Future Paralegals of America, 289RMIT University, 290Universal Higher Education, 291Eastlake High School, 292CSMSS Chh. Shahu College of Engineering, 293Central Mindanao University, 294University of Montreal, 295University of Bradford, 296Beni Suef University, 297Bogazici University, 298Mansoura University, 299Univerisity of Bristol, 300University of Oklahoma, 301Jala University, 302Florida Atlantic University, 303CONICET, 304Universidad Tecnológica Nacional, 305Bournemouth University, 306University of Warwick, 307University of Alabama Huntsville, 308Van Andel Institute, 309University of Hertfordshire, 310Central College, 311Sheffield Teaching Hospitals NHS Foundation Trust, 312Nottingham Trent University, 313Max Planck Institute for Intelligent Systems, 314Outevsky Bespoke Dance Education, 315University of Virginia, 316Dartmouth College, 317INESC Microsistemas e Nanotecnologias, 318University of Minnesota, 319Aligarh Muslim University, 320John Crane UK Ltd, 321James Madison University, 322University of the Fraser Valley, 323Alan Turing Institute, 324Rice University, 325HUN-REN, 326Forschungszentrum Jülich

Introduction

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam, a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. The dataset consists of 3,000 challenging questions across over a hundred subjects. We publicly release these questions, while maintaining a private test set of held out questions to assess model overfitting.

Difficulty comparison across benchmarks

Compared against the saturation of some existing benchmarks, Humanity's Last Exam accuracy remains low across several frontier models, demonstrating its effectiveness for measuring advanced, closed-ended, academic capabilities.

Dataset

Humanity's Last Exam (HLE) is a global collaborative effort, with questions from nearly 1,000 subject expert contributors affiliated with over 500 institutions across 50 countries – comprised mostly of professors, researchers, and graduate degree holders.

Examples 1-2/8

Classics

Question:

Question image

Here is a representation of a Roman inscription, originally found on a tombstone. Provide a translation for the Palmyrene script.
A transliteration of the text is provided: RGYNᵓ BT ḤRY BR ᶜTᵓ ḤBL

Henry T

Merton College, Oxford

Ecology

Question:

Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.

Edward V

Massachusetts Institute of Technology

Samples of the diverse and challenging questions submitted to Humanity's Last Exam.

Quantitative Results

Accuracy. All frontier models achieve low accuracy on Humanity's Last Exam, highlighting significant room for improvement in narrowing the gap between current LLMs and expert-level academic capabilities on closed-ended questions.

Calibration Error. Given low performance on Humanity's Last Exam, models should be calibrated, recognizing their uncertainty rather than confidently provide incorrect answers, indicative of confabulation/hallucination. To measure calibration, we prompt models to provide both an answer and their confidence from 0% to 100%.

ModelAccuracy (%) ↑Calibration Error (%) ↓
GPT-4o logoGPT-4o3.392.5
Grok-2 logoGrok-23.893.2
Claude 3.5 Sonnet logoClaude 3.5 Sonnet4.388.9
Gemini Thinking logoGemini Thinking7.791.2
o1 logoo19.193.4
DeepSeek-R1* logoDeepSeek-R1*9.481.8
o3-mini (medium)* logoo3-mini (medium)*10.592.0
o3-mini (high)* logoo3-mini (high)*13.093.2

*Model is not multi-modal, evaluated on text-only subset.

Discussion

Future Model Performance

While current LLMs achieve very low accuracy on Humanity's Last Exam, recent history shows benchmarks are quickly saturated -- with models dramatically progressing from near-zero to near-perfect performance in a short timeframe. Given the rapid pace of AI development, it is plausible that models could exceed 50% accuracy on HLE by the end of 2025. High accuracy on HLE would demonstrate expert-level performance on closed-ended, verifiable questions and cutting-edge scientific knowledge, but it would not alone suggest autonomous research capabilities or "artificial general intelligence." HLE tests structured academic problems rather than open-ended research or creative problem-solving abilities, making it a focused measure of technical knowledge and reasoning. HLE may be the last academic exam we need to give to models, but it is far from the last benchmark for AI.

Impact

By providing a clear measure of AI progress, Humanity's Last Exam creates a common reference point for scientists and policymakers to assess AI capabilities. This enables more informed discussions about development trajectories, potential risks, and necessary governance measures.

Citation

For any inquiries or feedback, please contact us at agibenchmark@safe.ai
Submit feedback to questions in the dataset via this form