reasoning evaluation 6
- A Systematic Evaluation of Large Language Models on Out-of-Distribution Logical Reasoning Tasks
- Are Large Language Models Really Good Logical Reasoners
- Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
- Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4
- A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
- Abductive and inductive reasoning