Processing answers...
About this project
This project introduces CRAB (Code Review Automated Benchmark), a high-quality benchmark designed to evaluate deep learning-based code review automation tools. It focuses on two key tasks:
- Comment Generation: Generating natural language review comments that identify issues and suggest improvements for a given piece of code.
- Code Refinement: Producing revised code that correctly implements the suggestions from a review comment.
The dataset consists of
carefully curated triplets
<submitted_code, reviewer_comment, revised_code>
—ensuring each comment is
actionable and each revision implements the suggested change. This eliminates noise common in
previous datasets and supports reliable, meaningful evaluation.
To support model benchmarking, we also provide a web-based evaluation platform (the website on which you are reading this description) that allows researchers to download the dataset, submit their predictions, and assess model performance across both tasks.
You can explore the source code for each component here:
This website lets you evaluate code review models against the CRAB benchmark. You can download input files for either the comment generation or code refinement task, upload your model’s predictions, and view the results once processing is complete. Each section includes a help icon that provides more detailed instructions and file format guidelines.
Download a Dataset
When you download a dataset, you'll receive a ZIP archive containing a JSON file. The structure of this file depends on the selected task.
Comment Generation
The JSON maps each ID to an object with:
- files: a map of filenames to their content at the start of the pull request.
- diffs: a map of filenames to the diff that was applied to each file before the comment was made.
{
"1234": {
"files": {
"src/Main.java": "public class Main { ... }"
},
"diffs": {
"src/Main.java": "@@ -1,3 +1,6 @@ ..."
}
}
}
Code Refinement
The JSON structure is similar to comment generation, with one additional field:
- files: the initial version of each file in the PR.
- diffs: the diff applied before the comment was made.
- comments: a list of comments, each with a body, the file it refers to, and the exact location of the comment.
{
"5678": {
"files": { ... },
"diffs": { ... },
"comments": [
{
"body": "Consider simplifying this logic.",
"file": "src/Util.java",
"location": {
"start_line": 42,
"end_line": 45
}
}
]
}
}
With Context (Optional)
You can choose to download the dataset with full repository context — the state of the entire codebase at the time the PR was created. This may help your model better understand the broader project structure and dependencies outside of the changed files.
Upload Your Predictions
After downloading a dataset and generating your predictions for either task, you can upload your results here to start the evaluation process.
Your uploaded JSON file must follow one of the schemas described below, depending on the selected task. Once uploaded, the system will begin evaluating your submission. A progress bar will appear to show how far along the evaluation is.
For Code Refinement, an id will also be displayed. This id allows you to safely close the browser tab and later check the evaluation progress by pasting it into the "Get status of ongoing process" section. More information is available in that section.
Comment Generation
Submit a JSON object where each key is a string ID and each value is an object representing the proposed comment for that instance. Each comment must specify the file path, the starting line, the ending line (or null), and the comment text itself.
All fields are required and must follow the expected types exactly. The line_to
field can be null if the comment applies to a single line.
{
"1234": {
"path": "src/Main.java",
"line_from": 10,
"line_to": 12,
"body": "Consider extracting this block into a separate method."
},
"5678": {
"path": "src/Util.java",
"line_from": 42,
"line_to": null,
"body": "You might want to add a null check here."
}
}
Code Refinement
Submit a JSON object where each key is a string ID, and the value is another object that maps file paths (relative to the top-level directory) to the new file content.
{
"1234": {
"src/Main.java": "public class Main { /* updated code */ }"
},
"5678": {
"utils/Helper.java": "public class Helper { /* improved logic */ }"
}
}
Make sure your file strictly follows the expected format to avoid upload errors.
Getting Information About Ongoing Process
After you upload your predictions for either the Comment Generation or Code Refinement task, the evaluation begins in the background. Depending on the size of your submission and how busy the system is, this process may take some time. This is especially true for the code refinement task, which involves compiling and testing the submitted code.
To help you track progress, the system assigns a unique process ID to every submission. This ID will be shown to you right after the upload is complete. Be sure to save or copy this ID, as it is the only way to check on the status of your evaluation if you close or refresh the page.
To view the current status of a submission, enter your process ID in the field labeled "Process id" in this section and click the "Request status" button. The system will respond by showing whether your evaluation is still running, has finished successfully, or encountered an error.
When the process is complete, the results will automatically appear in a summary table in place of the progress bar. You will also see an option to download the results as a JSON file. The JSON includes more detailed information than what is shown in the table, such as exact evaluation scores, file-level data, and other metadata that may be useful for in-depth analysis.
If the process is still running, you can come back at any time and use the same ID to check the status again. The evaluation runs entirely in the background on our servers, so you do not need to keep the page open. You can safely close the browser, shut down your computer, or return on a different device. As long as you have your process ID, you will be able to retrieve your results later.
Important: If you lose or forget your process ID, you will not be able to retrieve your results. In that case, you would need to reupload your predictions and start a new evaluation. For this reason, we recommend saving the process ID as soon as it is displayed.