dataset 29
- PokerGPT
- Assessing Logical Puzzle Solving in Large Language Models
- LLM-Based Agent Society Investigation
- Eliminating Reasoning via Inferring with Planning
- GLoRE
- Towards LogiGLUE
- Avalons Game of Thoughts
- BRAINTEASER
- Exploring Large Language Models for Communication Games
- Are ChatGPT and GPT-4 Good Poker Players
- LatEval
- Go Beyond The Obvious
- Solving and Generating NPR Sunday Puzzles with Large Language Models
- ChessGPT
- BoardgameQA
- Large Language Models
- True Detective
- CC-Riddle
- Down and Across
- A Puzzle-Based Dataset for Natural Language Inference
- BiRdQA
- Puzzle Solving without Search or Human Knowledge
- Programming Puzzles
- Decrypting Cryptic Crosswords
- Cryptonite
- Did Aristotle Use a Laptop
- RiddleSense
- PIQA
- CommonsenseQA