CRAB (Code Review Automation Benchmark) is a research-driven platform designed to evaluate deep
learning models for code review tasks. Developed as part of a master's thesis at the Università
della Svizzera italiana, CRAB provides a high-quality, curated benchmark dataset of Java code
review triplets: submitted code, reviewer comment, and revised code. Each instance is manually
validated to ensure that reviewer comments directly address code issues and that the revised
code implements the feedback accurately.
The platform supports two core tasks: generating human-like review comments and refining code
based on those comments. It also accounts for paraphrased feedback and alternative valid code
revisions, offering a more realistic and robust evaluation. CRAB addresses the shortcomings of
existing datasets by eliminating noise and ensuring functional correctness through testing.
Researchers can upload model predictions to receive standardized evaluations, making CRAB an
essential tool for advancing automated code review technologies.