AlphaBorno at BLP-2025 Task 2: Code Generation with Structured Prompts and Execution Feedback
Abstract
AbstractThis paper explores various prompting strategies in the BLP-2025 Shared Task 2, utilizing a pipeline that first translates Bangla problem descriptions into English with GPT-4o,then applies techniques like zero-shot, few-shot,chain of thought, synthetic test case integration, and a self-repair loop. We evaluated fourLLMs (GPT-4o, Grok-3, Claude 3.7 Sonnet,and Qwen2.5-Coder 14B). Our findings revealthat while traditional methods like few-shotand chain-of-thought prompting provided inconsistent gains, the integration of explicit unittests delivered a substantial performance boostacross all models. The most effective strategycombined zero-shot prompting with these synthetic tests and a self-repair loop, leading GPT4o to achieve a top Pass@1 score of 72.2%.These results represent the value of using explicit constraints and iterative feedback in codegeneration, offering a solid framework that improves the model’s code generation capabilities.