Always private
DuckDuckGo never tracks your searches.
Learn More
You can hide this reminder in Search Settings
All regions
Argentina
Australia
Austria
Belgium (fr)
Belgium (nl)
Brazil
Bulgaria
Canada (en)
Canada (fr)
Catalonia
Chile
China
Colombia
Croatia
Czech Republic
Denmark
Estonia
Finland
France
Germany
Greece
Hong Kong
Hungary
Iceland
India (en)
Indonesia (en)
Ireland
Israel (en)
Italy
Japan
Korea
Latvia
Lithuania
Malaysia (en)
Mexico
Netherlands
New Zealand
Norway
Pakistan (en)
Peru
Philippines (en)
Poland
Portugal
Romania
Russia
Saudi Arabia
Singapore
Slovakia
Slovenia
South Africa
Spain (ca)
Spain (es)
Sweden
Switzerland (de)
Switzerland (fr)
Taiwan
Thailand (en)
Turkey
Ukraine
United Kingdom
US (English)
US (Spanish)
Vietnam (en)
Safe search: moderate
Strict
Moderate
Off
Any time
Any time
Past day
Past week
Past month
Past year
  1. May 20, 2024Testing two families of large language models (LLMs) (GPT and LLaMA2) on a battery of measurements spanning different theory of mind abilities, Strachan et al. find that the performance of LLMs ...
  2. researchsquare.com

    May 20, 2024Performance across Theory of Mind tests. Both GPT models performed well across most tests (see Fig. 1 A; 1 B), and showed impressive abilities to reason about social intentions, beliefs, and non-literal utterances. For each test, we conducted a series of two-way Bonferroni-corrected Wilcoxon tests comparing each LLM against human scores.
  3. pubmed.ncbi.nlm.nih.gov

    Across the battery of theory of mind tests, we found that GPT-4 models performed at, or even sometimes above, human levels at identifying indirect requests, false beliefs and misdirection, but struggled with detecting faux pas. Faux pas, however, was the only test where LLaMA2 outperformed humans.
  4. Testing theory of mind in large language models and humans ... Human GPT-4 GPT-3.5 LLaMA2-70B False belief Irony 0.029 1 0.008 9.18 × 10−4 0.955 0.086 0.955 1 1 0.123 0.462 0.002 Faux pas ... Testing theory of mind in large language models and humans James W. A. Strachan ...
  5. academia.edu

    This approach enabled us to reveal the existence of specific deviations from human-like behaviour that would have remained hidden using a single theory of mind test, or a single run of each test. Both GPT models exhibited impressive performance in tasks involving beliefs, intentions and non-literal utterances, with GPT-4 exceeding human levels ...
  6. blogs.upm.es

    May 29, 2024In a recent groundbreaking study published in the renowned journal Nature, a team of researchers from the ASTOUND project consortium, explored the theory of mind capabilities in humans and large language models (LLMs) such as GPT-4 and LLaMA2.This study, central to the ASTOUND project (GA 101071191) dives into how well these AI models can track and interpret human mental states, an ability ...
  7. Mar 18, 2024Specifically, across a battery of Theory of Mind tests, we found that GPT models performed at human levels when recognising indirect requests, false beliefs, and higher-order mental states like misdirection, but were specifically impaired at recognising faux pas. Follow-up studies revealed that this was due to GPT's conservatism in drawing ...
  8. semanticscholar.org

    May 20, 2024It is demonstrated that large language models exhibit behaviour that is consistent with the outputs of mentalistic inference in humans but also highlights the importance of systematic testing to ensure a non-superficial comparison between human and artificial intelligences. At the core of what defines us as humans is the concept of theory of mind: the ability to track other people's mental ...
  9. Can’t find what you’re looking for?

    Help us improve DuckDuckGo searches with your feedback

Custom date rangeX