Commit Graph

  • 8360b8bbe3 added small sh script that i forgot to add when i created it... main Karma Riuk 2025-06-28 11:09:47 +02:00
  • c6650b2b87 made code more consistent Karma Riuk 2025-06-23 11:19:36 +02:00
  • beaef7310f updated readme Karma Riuk 2025-06-12 17:31:13 +02:00
  • 9879c9e528 updated readme Karma Riuk 2025-06-12 17:30:58 +02:00
  • 42f02595f0 updated readme Karma Riuk 2025-06-12 17:29:49 +02:00
  • c52261560f updated requirements Karma Riuk 2025-06-12 17:23:37 +02:00
  • 7bb9eda8a4 removed useless files Karma Riuk 2025-06-12 17:23:11 +02:00
  • 3e98eba9d2 updated readme Karma Riuk 2025-06-12 17:18:45 +02:00
  • 31402752c8 updated readme Karma Riuk 2025-06-12 17:16:06 +02:00
  • f898028fcc removed types of paramaters Karma Riuk 2025-06-12 17:14:33 +02:00
  • 1a82d72a43 updated readme Karma Riuk 2025-06-12 17:11:05 +02:00
  • 5819458c17 updated readme Karma Riuk 2025-06-12 17:09:41 +02:00
  • 72fadaabe8 updated readme Karma Riuk 2025-06-12 17:05:06 +02:00
  • 81b2c1f782 updated readme Karma Riuk 2025-06-12 16:58:07 +02:00
  • b0349dc44b updated readme Karma Riuk 2025-06-12 16:55:05 +02:00
  • be2f26a7f6 drafted readme Karma Riuk 2025-06-12 16:53:37 +02:00
  • f1204c4c35 fixed extraction of correct predictions Karma Riuk 2025-06-12 13:25:41 +02:00
  • 21de0ffd7a updated logic for the extraction of correct predictions Karma Riuk 2025-06-10 23:41:45 +02:00
  • f36fcc6e05 updated the way the comment generation and code refinement inputs are exported (automatized the putting of archives for context) Karma Riuk 2025-06-10 23:40:44 +02:00
  • f5bdfd1a1b the input to code refinement now ignores paraphrases Karma Riuk 2025-06-10 20:45:51 +02:00
  • 429fe9b060 implemented new way to extract stats from dataset Karma Riuk 2025-06-10 20:42:58 +02:00
  • dd52e43000 added way to put paraphrases from external csv Karma Riuk 2025-06-10 20:42:38 +02:00
  • 1754f93018 quality of life for manual selection Karma Riuk 2025-06-05 10:49:05 +02:00
  • 4c5e486ad6 added small .unique() on repo names to avoid processing a repo twice Karma Riuk 2025-06-05 10:45:01 +02:00
  • bf1591c61d fixed bug in manual selection Karma Riuk 2025-06-04 12:07:41 +02:00
  • 9a24a734e7 added the printing of the relevant hunk when asking for comment relevance Karma Riuk 2025-06-04 10:37:48 +02:00
  • 6110640a6f fixed condition to check whether a comment was within the diffs Karma Riuk 2025-06-03 13:39:02 +02:00
  • 792195e33c now ensuring the comment is within the diff_before changes Karma Riuk 2025-06-03 11:51:38 +02:00
  • 926d3a3681 now using original start line as default and start line as backup instead of the other way around Karma Riuk 2025-06-03 11:51:07 +02:00
  • 154837827d added link to paraphrases extraction Karma Riuk 2025-06-03 10:10:57 +02:00
  • 45a8122408 using enum choice actoin instead of the previous thing we were using Karma Riuk 2025-06-03 10:10:36 +02:00
  • 66d046cbaa made filename a positional argument Karma Riuk 2025-06-03 10:10:19 +02:00
  • c05c9cb366 fixed manual selection Karma Riuk 2025-06-02 10:47:04 +02:00
  • 87b49b377d the removal of the is_code_related in field in selection broke backwards compatilibility. Fixed it Karma Riuk 2025-06-02 10:46:06 +02:00
  • 4648ba2560 fixed type annotation Karma Riuk 2025-06-02 09:50:09 +02:00
  • 09df9a1ae8 removed already done TODO Karma Riuk 2025-06-02 09:49:59 +02:00
  • 5b8357567b removed code relatedness from manual selection since now it's already done by pull_requests Karma Riuk 2025-06-02 09:48:27 +02:00
  • b311c49f9a simplifying logging of common error we can't do much about Karma Riuk 2025-05-28 10:17:59 +02:00
  • 77ed66ded8 added small logging statement Karma Riuk 2025-05-28 10:17:48 +02:00
  • e097885e36 only writing the dataset to disk when there are new entries Karma Riuk 2025-05-28 10:17:19 +02:00
  • 0b182837c1 added new option for dataset Karma Riuk 2025-05-27 10:50:10 +02:00
  • 900003bac7 added a way to extract the information to then generate paraphrases Karma Riuk 2025-05-27 10:48:17 +02:00
  • 63b69e40b8 tried to make requests cache better Karma Riuk 2025-05-26 11:36:31 +02:00
  • a4ce620aa0 printing stacktrace when error is made Karma Riuk 2025-05-26 11:36:04 +02:00
  • 7e00656ab1 fixed condition Karma Riuk 2025-05-26 11:35:54 +02:00
  • e619d2f339 added another point of failure Karma Riuk 2025-05-21 10:59:07 +02:00
  • 5734ca5c8d if we are multithreading, give some time between the requests Karma Riuk 2025-05-21 09:33:58 +02:00
  • b598e97bc6 moved update of pbar Karma Riuk 2025-05-21 09:33:44 +02:00
  • 9ced42b6c4 removed unused variable Karma Riuk 2025-05-21 09:33:31 +02:00
  • f1e8b896bb fixed slight bug Karma Riuk 2025-05-21 09:18:51 +02:00
  • a8ccf081a2 formatted file Karma Riuk 2025-05-21 09:18:33 +02:00
  • d48c5d04b8 removed stat that has become useless Karma Riuk 2025-05-20 16:50:21 +02:00
  • 33cea7bbb4 added cute little units for the progress bars Karma Riuk 2025-05-20 16:50:04 +02:00
  • ea7b510926 added another way pr can be invalid, if they have no lines for their comment (github api be wierd) Karma Riuk 2025-05-20 16:40:50 +02:00
  • 09ee7995ff made more general function to move logging to file Karma Riuk 2025-05-20 16:39:52 +02:00
  • c577b3a6e5 saving all the results after any execption Karma Riuk 2025-05-20 09:59:51 +02:00
  • e6c5c8df82 sorting the values descending, to have the top most of the given column first Karma Riuk 2025-05-20 09:59:09 +02:00
  • 975b25f2f6 removed print statements Karma Riuk 2025-05-20 09:58:55 +02:00
  • a3a89bb346 moved some logic around Karma Riuk 2025-05-20 09:58:49 +02:00
  • b0443cc87f added exclusion list Karma Riuk 2025-05-20 09:57:53 +02:00
  • 04a37030f4 populating cache only if there is any cache Karma Riuk 2025-05-20 09:57:04 +02:00
  • 15ffe67b0e added fields to the metadata to make manual filtering easier Karma Riuk 2025-05-20 09:51:50 +02:00
  • 3ffbb229b8 fixed typo Karma Riuk 2025-05-20 09:50:00 +02:00
  • 6e0aca2ad5 fixed typo Karma Riuk 2025-05-20 09:48:31 +02:00
  • 4d9c47f33a added all the progress bars for each worker Karma Riuk 2025-05-17 10:45:57 +02:00
  • 98db478b7b first draft of parallelization (NOT TESTED YET) Karma Riuk 2025-05-17 09:42:02 +02:00
  • 25072ac8b3 removed unused import Karma Riuk 2025-05-17 09:37:33 +02:00
  • b90111c652 now i'm not crashing when no GITHUB_API_TOKEN is given, rather just printing a warning Karma Riuk 2025-05-17 09:33:18 +02:00
  • 970ee1c363 added the possibility of sorting the incoming csv by a certain column, now taking any csv instead of the result of clone_repos.py Karma Riuk 2025-05-17 09:32:40 +02:00
  • 3ea3e980bd added metavar names for arguments Karma Riuk 2025-05-17 09:16:52 +02:00
  • 5cf5e5a8ee added enumchoiceaction for easier enum in argparse handling Karma Riuk 2025-05-16 19:41:15 +02:00
  • 14e64984c5 instead of adding the cache as we go through the repos, just add it before any processing, so we are sure to keep all the previously saved data Karma Riuk 2025-05-16 19:39:59 +02:00
  • b84ea797ff moved enums to top of file Karma Riuk 2025-05-16 19:11:26 +02:00
  • 25161c4d46 refactored manual selection into smaller bits, easier to consume Karma Riuk 2025-05-16 09:58:14 +02:00
  • 63c6785b4d when asking for entries that implement the suggested change, we now only keep the ones that are code related Karma Riuk 2025-05-14 20:53:02 +02:00
  • c55d21d042 removed unused import Karma Riuk 2025-05-14 20:52:52 +02:00
  • 17823e39fe added placeholder for paraphrases Karma Riuk 2025-05-14 20:52:37 +02:00
  • cabf8fd823 addedthe build of the reference map to the dataset Karma Riuk 2025-05-14 20:52:17 +02:00
  • 5328fe59e1 moved dependency to somewhere optional Karma Riuk 2025-05-14 15:40:52 +02:00
  • 5403ff5d4d added code to extract the expected output given a dataset (to test the website) Karma Riuk 2025-05-14 15:40:10 +02:00
  • acce738872 made the requests never expire Karma Riuk 2025-05-14 09:39:55 +02:00
  • a13a29c6de removed buggy continue Karma Riuk 2025-05-14 09:39:33 +02:00
  • 726a0d92e1 using output (forgot to commit it before) Karma Riuk 2025-05-14 09:36:52 +02:00
  • 0b02518374 found out that some couldn't checkout due to conflicts, but if you force it, it works Karma Riuk 2025-05-14 09:36:12 +02:00
  • ccd962c205 now using the metadata to get archive name Karma Riuk 2025-05-14 09:34:59 +02:00
  • 1f91acf6c1 moved click to optional requirments Karma Riuk 2025-05-14 09:19:08 +02:00
  • c731fd3393 commented out print Karma Riuk 2025-05-14 09:18:36 +02:00
  • 65806ccbe3 now the metadata knows it's archive name Karma Riuk 2025-05-14 09:18:11 +02:00
  • ae516b6c34 removed good and put is_code_related in selection instead Karma Riuk 2025-05-14 09:17:16 +02:00
  • ea3a2b72e5 added output to manual_selection in order not to overwrite every time Karma Riuk 2025-05-14 09:16:08 +02:00
  • 2f2cbae756 fixed typing of parameters for server version of python Karma Riuk 2025-05-12 11:57:30 +02:00
  • a701dc236c when selecting, we can now choose which diff hunk is relevant and modify it if necessary Karma Riuk 2025-05-07 10:39:21 +02:00
  • 959184b2a8 we can now clean the dataset from useless entries Karma Riuk 2025-05-07 10:38:41 +02:00
  • 36b7dc5c02 added uuid when creating the dataset Karma Riuk 2025-05-07 10:38:20 +02:00
  • af89051779 prompts can now have a default value when enter is pressed Karma Riuk 2025-05-07 10:35:27 +02:00
  • 97646cb8c3 fixed log statements Karma Riuk 2025-05-07 10:32:02 +02:00
  • 40fa958cf8 added uuid as id Karma Riuk 2025-05-07 10:18:38 +02:00
  • b3877733cb we can now generate the datasets to be served to users Karma Riuk 2025-04-29 14:41:11 +02:00
  • bde9d45c10 implemented the manual selection script Karma Riuk 2025-04-29 14:40:58 +02:00
  • 03b89872dd added selection field to dataset for manual_selection Karma Riuk 2025-04-28 09:51:38 +02:00