From dfd1a0ae423acf0606f523f8aa9929f7a6e1fd80 Mon Sep 17 00:00:00 2001 From: Karma Riuk Date: Thu, 12 Jun 2025 16:12:58 +0200 Subject: [PATCH] draft readme --- README.md | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 161 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..a6227bc --- /dev/null +++ b/README.md @@ -0,0 +1,161 @@ +# CRAB Webapp (Code Review Automation Benchmark) + +A research-driven platform for evaluating deep learning models on automated code review tasks. CRAB provides two core services: + +- **Dataset download**: Obtain high-quality, curated Java code review datasets for **comment + generation** and **code refinement** tasks. +- **Result evaluation**: Upload model-generated predictions to receive standardized evaluation + metrics via a REST+WebSocket API. + +## Table of Contents + +- [Features](#features) + +- [Prerequisites](#prerequisites) + +- [Installation & Setup](#installation--setup) + + - [Environment Variables](#environment-variables) + +- [Running the Application](#running-the-application) + +- [Using the Webapp](#using-the-webapp) + + - [Download a Dataset](#download-a-dataset) + - [Upload Predictions](#upload-predictions) + - [Track Submission Status](#track-submission-status) + +- [API Endpoints](#api-endpoints) + +- [Project Structure](#project-structure) + +- [Contributing](#contributing) + +- [License](#license) + +- [Acknowledgements](#acknowledgements) + +## Features + +- **Static Frontend**: Vanilla HTML/CSS/JS interface—no build toolchain required. +- **Dataset Delivery**: ZIP archives of JSON files, with optional full repo context fileciteturn3file13. +- **Submission Queue**: Server-managed job queue with configurable parallelism (via `MAX_WORKERS`) fileciteturn3file0. +- **Real‑time Feedback**: Progress updates over WebSockets (using Flask-SocketIO) fileciteturn3file3. +- **Robust Data Processing**: Utilities for parsing, validating, and evaluating submissions in `src/utils`. + +## Prerequisites + +- **Python 3.8+** +- *(Optional)* Docker daemon if you wish to containerize the service + +## Installation & Setup + +1. **Clone** the repository: + + ```bash + git clone https://github.com/yourusername/crab-webapp.git + cd crab-webapp + ``` + +1. **Install** Python dependencies: + + ```bash + pip install -r requirements.txt + ``` + +### Environment Variables + +Defaults are set in `src/utils/env_defaults.py` (port 45003, `data/` path, etc.) fileciteturn3file5. To override: + +```bash +cp .env.example .env +# Edit .env to adjust: +# PORT=..., MAX_WORKERS=..., DATA_PATH=..., RESULTS_DIR=... +``` + +## Running the Application + +From the project root: + +```bash +python -m src.server +``` + +- The Flask app serves static files from `public/` at `/` and mounts API routes under `/datasets` and `/answers` via Blueprints fileciteturn3file3. +- By default, open your browser to **[http://localhost:45003/](http://localhost:45003/)**. + +## Using the Webapp + +### Download a Dataset + +1. Select **Comment Generation** or **Code Refinement**. +1. (Optional) Check **Include context** to get full repo snapshots. +1. Click **Download** to receive a ZIP with JSON (see schemas in `public/index.html`) fileciteturn3file14. + +### Upload Predictions + +1. Choose task type (`comment` or `refinement`). +1. Select your JSON file (matching the dataset schema). +1. Click **Upload JSON**. +1. The server responds with a **process ID**. + +### Track Submission Status + +- Progress bar displays real-time percentage via WebSocket events. + +- You can also poll **GET** `/answers/status/` (requires `X-Socket-Id` header) to retrieve: + + - `status`: `created`, `waiting`, `processing`, or `complete` + - on completion: `{ type, results }` JSON payload fileciteturn3file0. + +## API Endpoints + +| Method | Route | Description | +| ------ | ------------------------------ | ------------------------------------------------------------------------------------------------------------------------- | +| GET | `/datasets/download/` | Download ZIP of `comment_generation` or `code_refinement` (use `?withContext=true` for full repo). fileciteturn3file13 | +| POST | `/answers/submit/comment` | Submit comment-generation JSON. | +| POST | `/answers/submit/refinement` | Submit code-refinement JSON. | +| GET | `/answers/status/` | Poll status or results (include `X-Socket-Id`). | + +## Project Structure + +``` +├── data/ # Dataset files: dataset.json, archives, etc. +├── public/ # Static frontend +│ ├── css/style.css # Styles +│ ├── img/crab.png # Icon +│ ├── index.html # UI with modals, schema docs +│ └── js/ # Frontend scripts +│ ├── index.js # UI logic, fetch & WebSocket handlers +│ ├── modal.js # Modal dialogs +│ └── sorttable.js # Table sorting +├── src/ # Backend source +│ ├── server.py # App entry: Flask + SocketIO fileciteturn3file3 +│ ├── routes/ # Blueprints +│ │ ├── index.py # Root & health-check +│ │ ├── datasets.py # File downloads fileciteturn3file13 +│ │ └── answers.py # Submission & status endpoints fileciteturn3file1 +│ └── utils/ # Core logic & helpers +│ ├── env_defaults.py # Default ENV vars fileciteturn3file5 +│ ├── dataset.py # Load/validate dataset JSON fileciteturn3file2 +│ ├── process_data.py # Evaluation functions +│ ├── observer.py # WebSocket observer & queue cleanup fileciteturn3file17 +│ ├── queue_manager.py # Concurrency control +│ └── build_handlers.py# Build/test wrappers +├── requirements.txt # Python libs: Flask, SocketIO, dotenv, etc. fileciteturn3file12 +├── TODO.md # Next steps and backlog +└── .env.example # Template for environment variables +``` + +## Contributing + +Issues and PRs welcome! Please follow existing style, add tests for new features, and update documentation accordingly. + +## License + +This project is licensed under [Your License Here]. + +## Acknowledgements + +- Developed as part of a Master's thesis at USI. +- Inspired by Dean Edwards' sortable tables (sorttable.js) and Flask‑SocketIO examples.