Files
crab-webapp/README.md
2025-06-16 11:44:09 +02:00

115 lines
4.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CRAB Webapp (Code Review Automation Benchmark)
A research-driven platform for evaluating deep learning models on automated code review tasks. CRAB provides two core services:
- **Dataset download**: Obtain high-quality, curated Java code review datasets for **comment
generation** and **code refinement** tasks.
- **Result evaluation**: Upload model-generated predictions to receive standardized evaluation
metrics via a REST+WebSocket API.
## Features
- **Static Frontend**: Vanilla HTML/CSS/JS interface—no build toolchain required.
- **Dataset Delivery**: ZIP archives of JSON files, with optional full repo context.
- **Submission Queue**: Server-managed job queue with configurable parallelism (via `MAX_WORKERS`).
- **Realtime Feedback**: Progress updates over WebSockets (using Flask-SocketIO).
- **Robust Data Processing**: Utilities for parsing, validating, and evaluating submissions in `src/utils`.
## Prerequisites
- **Python 3.8+**
- *(Optional)* Docker daemon if you wish to execute the code refinement evaluation
## Installation & Setup
1. **Clone** the repository:
```bash
git clone https://github.com/karma-riuk/crab-webapp.git
cd crab-webapp
```
1. *(Optional)* **Create Python Environement**
```bash
python -m venv .venv
source .venv/bin/activate
```
1. **Install** Python dependencies:
```bash
pip install -r requirements.txt
```
### Environment Variables
Defaults are set in `src/utils/env_defaults.py` (port 45003, `data/` path, etc.) and described in
the comments of `.env.example`. To override:
```bash
cp .env.example .env
# Edit .env to adjust:
# PORT=..., MAX_WORKERS=..., DATA_PATH=..., RESULTS_DIR=...
```
## Running the Application
From the project root:
```bash
python src/server.py
```
- The Flask app serves static files from `public/` at `/` and mounts API routes under `/datasets` and `/answers` via Blueprints.
- By default, open your browser to **[http://localhost:45003/](http://localhost:45003/)**.
- If you want to try it out, you can go on **[http://gym.si.usi.ch:45003](http://gym.si.usi.ch:45003)** (you must be connected to USI network to access it).
## API Endpoints
| Method | Route | Description |
| ------ | ------------------------------ | ------------------------------------------------------------------------------------------------------------------------- |
| GET | `/datasets/download/<dataset>` | Download ZIP of `comment_generation` or `code_refinement` (use `?withContext=true` for full repo).|
| POST | `/answers/submit/comment` | Submit comment-generation JSON. |
| POST | `/answers/submit/refinement` | Submit code-refinement JSON. |
| GET | `/answers/status/<id>` | Poll status or results (may include `X-Socket-Id` for notifications). |
## Project Structure
```
├── data/ # Dataset files: dataset.json, archives, etc.
├── public/ # Static frontend
│ ├── css/style.css # Styles
│ ├── img/crab.png # Icon
│ ├── index.html # UI with modals, schema docs
│ └── js/ # Frontend scripts
│ ├── index.js # UI logic, fetch & WebSocket handlers
│ ├── modal.js # Modal dialogs
│ └── sorttable.js # Table sorting
├── src/ # Backend source
│ ├── server.py # App entry: Flask + SocketIO
│ ├── routes/ # Blueprints
│ │ ├── index.py # Root & health-check
│ │ ├── datasets.py # File downloads
│ │ └── answers.py # Submission & status endpoints
│ └── utils/ # Core logic & helpers
│ ├── env_defaults.py # Default ENV vars
│ ├── dataset.py # Load/validate dataset JSON
│ ├── process_data.py # Evaluation functions
│ ├── observer.py # WebSocket observer & queue cleanup
│ ├── queue_manager.py # Concurrency control
│ └── build_handlers.py # Build/test wrappers
├── requirements.txt # Python libs: Flask, SocketIO, dotenv, etc.
├── TODO.md # Next steps and backlog
└── .env.example # Template for environment variables
```
## Contributing
1. **Issue Tracker**: Please file issues for bugs or feature requests.
1. **Pull Requests**: Fork, create a topic branch, and submit a PR. Please include tests or validations where applicable.
## Acknowledgements
- Developed as part of a Master's thesis at Università della Svizzera Italiana.