Processing answers...
About this project
This project introduces CRAB (Code Review Automated Benchmark), a high-quality benchmark designed to evaluate deep learning-based code review automation tools. It focuses on two key tasks:
- Comment Generation: Generating natural language review comments that identify issues and suggest improvements for a given piece of code.
- Code Refinement: Producing revised code that correctly implements the suggestions from a review comment.
The dataset consists of
carefully curated triplets
<submitted_code, reviewer_comment, revised_code>
—ensuring each comment is
actionable and each revision implements the suggested change. This eliminates noise common in
previous datasets and supports reliable, meaningful evaluation.
To support model benchmarking, we also provide a web-based evaluation platform (the website on which you are reading this description) that allows researchers to download the dataset, submit their predictions, and assess model performance across both tasks.
You can explore the source code for each component here:
This website lets you evaluate code review models against the CRAB benchmark. You can download input files for either the comment generation or code refinement task, upload your model’s predictions, and view the results once processing is complete. Each section includes a help icon that provides more detailed instructions and file format guidelines.
Downloading the Dataset
When you download a dataset, you'll receive a ZIP archive containing a JSON file. The structure of this file depends on the selected task.
Comment Generation
The JSON maps each ID to an object with:
- files: a map of filenames to their content at the start of the pull request.
- diffs: a map of filenames to the diff that was applied to each file before the comment was made.
{
"1234": {
"files": {
"src/Main.java": "public class Main { ... }"
},
"diffs": {
"src/Main.java": "@@ -1,3 +1,6 @@ ..."
}
}
}
Code Refinement
The JSON structure is similar to comment generation, with one additional field:
- files: the initial version of each file in the PR.
- diffs: the diff applied before the comment was made.
- comments: a list of comments, each with a body, the file it refers to, and the exact location of the comment.
{
"5678": {
"files": { ... },
"diffs": { ... },
"comments": [
{
"body": "Consider simplifying this logic.",
"file": "src/Util.java",
"location": {
"start_line": 42,
"end_line": 45
}
}
]
}
}
With Context (Optional)
You can choose to download the dataset with full repository context — the state of the entire codebase at the time the PR was created. This may help your model better understand the broader project structure and dependencies outside of the changed files.
Uploading Results
After downloading a dataset and generating your predictions for either task, you can upload your results here to start the evaluation process.
Your uploaded JSON file must follow one of the schemas described below, depending on the selected task. Once uploaded, the system will begin evaluating your submission. A progress bar will appear to show how far along the evaluation is.
For Code Refinement, an id will also be displayed. This id allows you to safely close the browser tab and later check the evaluation progress by pasting it into the "Get status of ongoing process" section. More information is available in that section.
Comment Generation
Submit a JSON object where each key is a string ID and each value is the generated comment.
{
"1234": "This method lacks null checks.",
"5678": "Consider renaming this variable for clarity."
}
Code Refinement
Submit a JSON object where each key is a string ID, and the value is another object that maps file paths (relative to the top-level directory) to the new file content.
{
"1234": {
"src/Main.java": "public class Main { /* updated code */ }"
},
"5678": {
"utils/Helper.java": "public class Helper { /* improved logic */ }"
}
}
Make sure your file strictly follows the expected format to avoid upload errors.