Files
crab-webapp/README.md
2025-06-12 16:12:58 +02:00

162 lines
6.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CRAB Webapp (Code Review Automation Benchmark)
A research-driven platform for evaluating deep learning models on automated code review tasks. CRAB provides two core services:
- **Dataset download**: Obtain high-quality, curated Java code review datasets for **comment
generation** and **code refinement** tasks.
- **Result evaluation**: Upload model-generated predictions to receive standardized evaluation
metrics via a REST+WebSocket API.
## Table of Contents
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Installation & Setup](#installation--setup)
- [Environment Variables](#environment-variables)
- [Running the Application](#running-the-application)
- [Using the Webapp](#using-the-webapp)
- [Download a Dataset](#download-a-dataset)
- [Upload Predictions](#upload-predictions)
- [Track Submission Status](#track-submission-status)
- [API Endpoints](#api-endpoints)
- [Project Structure](#project-structure)
- [Contributing](#contributing)
- [License](#license)
- [Acknowledgements](#acknowledgements)
## Features
- **Static Frontend**: Vanilla HTML/CSS/JS interface—no build toolchain required.
- **Dataset Delivery**: ZIP archives of JSON files, with optional full repo context fileciteturn3file13.
- **Submission Queue**: Server-managed job queue with configurable parallelism (via `MAX_WORKERS`) fileciteturn3file0.
- **Realtime Feedback**: Progress updates over WebSockets (using Flask-SocketIO) fileciteturn3file3.
- **Robust Data Processing**: Utilities for parsing, validating, and evaluating submissions in `src/utils`.
## Prerequisites
- **Python 3.8+**
- *(Optional)* Docker daemon if you wish to containerize the service
## Installation & Setup
1. **Clone** the repository:
```bash
git clone https://github.com/yourusername/crab-webapp.git
cd crab-webapp
```
1. **Install** Python dependencies:
```bash
pip install -r requirements.txt
```
### Environment Variables
Defaults are set in `src/utils/env_defaults.py` (port 45003, `data/` path, etc.) fileciteturn3file5. To override:
```bash
cp .env.example .env
# Edit .env to adjust:
# PORT=..., MAX_WORKERS=..., DATA_PATH=..., RESULTS_DIR=...
```
## Running the Application
From the project root:
```bash
python -m src.server
```
- The Flask app serves static files from `public/` at `/` and mounts API routes under `/datasets` and `/answers` via Blueprints fileciteturn3file3.
- By default, open your browser to **[http://localhost:45003/](http://localhost:45003/)**.
## Using the Webapp
### Download a Dataset
1. Select **Comment Generation** or **Code Refinement**.
1. (Optional) Check **Include context** to get full repo snapshots.
1. Click **Download** to receive a ZIP with JSON (see schemas in `public/index.html`) fileciteturn3file14.
### Upload Predictions
1. Choose task type (`comment` or `refinement`).
1. Select your JSON file (matching the dataset schema).
1. Click **Upload JSON**.
1. The server responds with a **process ID**.
### Track Submission Status
- Progress bar displays real-time percentage via WebSocket events.
- You can also poll **GET** `/answers/status/<id>` (requires `X-Socket-Id` header) to retrieve:
- `status`: `created`, `waiting`, `processing`, or `complete`
- on completion: `{ type, results }` JSON payload fileciteturn3file0.
## API Endpoints
| Method | Route | Description |
| ------ | ------------------------------ | ------------------------------------------------------------------------------------------------------------------------- |
| GET | `/datasets/download/<dataset>` | Download ZIP of `comment_generation` or `code_refinement` (use `?withContext=true` for full repo). fileciteturn3file13 |
| POST | `/answers/submit/comment` | Submit comment-generation JSON. |
| POST | `/answers/submit/refinement` | Submit code-refinement JSON. |
| GET | `/answers/status/<id>` | Poll status or results (include `X-Socket-Id`). |
## Project Structure
```
├── data/ # Dataset files: dataset.json, archives, etc.
├── public/ # Static frontend
│ ├── css/style.css # Styles
│ ├── img/crab.png # Icon
│ ├── index.html # UI with modals, schema docs
│ └── js/ # Frontend scripts
│ ├── index.js # UI logic, fetch & WebSocket handlers
│ ├── modal.js # Modal dialogs
│ └── sorttable.js # Table sorting
├── src/ # Backend source
│ ├── server.py # App entry: Flask + SocketIO fileciteturn3file3
│ ├── routes/ # Blueprints
│ │ ├── index.py # Root & health-check
│ │ ├── datasets.py # File downloads fileciteturn3file13
│ │ └── answers.py # Submission & status endpoints fileciteturn3file1
│ └── utils/ # Core logic & helpers
│ ├── env_defaults.py # Default ENV vars fileciteturn3file5
│ ├── dataset.py # Load/validate dataset JSON fileciteturn3file2
│ ├── process_data.py # Evaluation functions
│ ├── observer.py # WebSocket observer & queue cleanup fileciteturn3file17
│ ├── queue_manager.py # Concurrency control
│ └── build_handlers.py# Build/test wrappers
├── requirements.txt # Python libs: Flask, SocketIO, dotenv, etc. fileciteturn3file12
├── TODO.md # Next steps and backlog
└── .env.example # Template for environment variables
```
## Contributing
Issues and PRs welcome! Please follow existing style, add tests for new features, and update documentation accordingly.
## License
This project is licensed under [Your License Here].
## Acknowledgements
- Developed as part of a Master's thesis at USI.
- Inspired by Dean Edwards' sortable tables (sorttable.js) and FlaskSocketIO examples.