Mothman at SemEval-2024 Task 9: An Iterative System for Chain-of-Thought Prompt Optimization

Alvin Po-Chun Chen; Ray Groshan; Sean von Bayern

2024 NAACL NAACL 2024

Mothman at SemEval-2024 Task 9: An Iterative System for Chain-of-Thought Prompt Optimization

Abstract

AbstractExtensive research exists on the performance of large language models on logic-based tasks, whereas relatively little has been done on their ability to generate creative solutions on lateral thinking tasks. The BrainTeaser shared task tests lateral thinking and uses adversarial datasets to prevent memorization, resulting in poor performance for out-of-the-box models. We propose a system for iterative, chain-of-thought prompt engineering which optimizes prompts using human evaluation. Using this shared task, we demonstrate our system’s ability to significantly improve model performance by optimizing prompts and evaluate the input dataset.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alvin Po-Chun Chen , Ray Groshan , Sean von Bayern

Topics

Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Learning Paradigms > Meta-Learning Natural Language Processing > Resources & Methods > Large Language Models

Keywords

prompt optimization chain-of-thought prompting lateral thinking human evaluation

Download PDF

Related papers

Working Alliance Transformer for Psychotherapy Dialogue Classification 2024

Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences 2024

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study 2024

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation 2024

Extractive Summarization with Text Generator 2024