326 Commits

Author SHA1 Message Date
a8ccf081a2 formatted file 2025-05-21 09:18:33 +02:00
d48c5d04b8 removed stat that has become useless 2025-05-20 16:50:21 +02:00
33cea7bbb4 added cute little units for the progress bars 2025-05-20 16:50:04 +02:00
ea7b510926 added another way pr can be invalid, if they have
no lines for their comment (github api be wierd)
2025-05-20 16:44:02 +02:00
09ee7995ff made more general function to move logging to file 2025-05-20 16:40:26 +02:00
c577b3a6e5 saving all the results after any execption 2025-05-20 09:59:51 +02:00
e6c5c8df82 sorting the values descending, to have the top
most of the given column first
2025-05-20 09:59:09 +02:00
975b25f2f6 removed print statements 2025-05-20 09:58:55 +02:00
a3a89bb346 moved some logic around 2025-05-20 09:58:49 +02:00
b0443cc87f added exclusion list 2025-05-20 09:57:53 +02:00
04a37030f4 populating cache only if there is any cache 2025-05-20 09:57:29 +02:00
15ffe67b0e added fields to the metadata to make manual
filtering easier
2025-05-20 09:56:54 +02:00
3ffbb229b8 fixed typo 2025-05-20 09:50:00 +02:00
6e0aca2ad5 fixed typo 2025-05-20 09:48:31 +02:00
4d9c47f33a added all the progress bars for each worker 2025-05-17 10:45:57 +02:00
98db478b7b first draft of parallelization (NOT TESTED YET) 2025-05-17 09:42:02 +02:00
25072ac8b3 removed unused import 2025-05-17 09:37:33 +02:00
b90111c652 now i'm not crashing when no GITHUB_API_TOKEN is
given, rather just printing a warning
2025-05-17 09:37:31 +02:00
970ee1c363 added the possibility of sorting the incoming csv
by a certain column, now taking any csv instead of
the result of clone_repos.py
2025-05-17 09:32:40 +02:00
3ea3e980bd added metavar names for arguments 2025-05-17 09:16:52 +02:00
5cf5e5a8ee added enumchoiceaction for easier enum in argparse
handling
2025-05-16 19:41:15 +02:00
14e64984c5 instead of adding the cache as we go through the
repos, just add it before any processing, so we are sure to keep all the previously saved data
2025-05-16 19:39:59 +02:00
b84ea797ff moved enums to top of file 2025-05-16 19:11:26 +02:00
25161c4d46 refactored manual selection into smaller bits,
easier to consume
2025-05-16 09:58:14 +02:00
63c6785b4d when asking for entries that implement the
suggested change, we now only keep the ones that
are code related
2025-05-14 20:53:02 +02:00
c55d21d042 removed unused import 2025-05-14 20:52:52 +02:00
17823e39fe added placeholder for paraphrases 2025-05-14 20:52:37 +02:00
cabf8fd823 addedthe build of the reference map to the dataset 2025-05-14 20:52:17 +02:00
5328fe59e1 moved dependency to somewhere optional 2025-05-14 15:40:52 +02:00
5403ff5d4d added code to extract the expected output given a
dataset (to test the website)
2025-05-14 15:40:10 +02:00
acce738872 made the requests never expire 2025-05-14 09:39:55 +02:00
a13a29c6de removed buggy continue 2025-05-14 09:39:33 +02:00
726a0d92e1 using output (forgot to commit it before) 2025-05-14 09:36:52 +02:00
0b02518374 found out that some couldn't checkout due to
conflicts, but if you force it, it works
2025-05-14 09:36:12 +02:00
ccd962c205 now using the metadata to get archive name 2025-05-14 09:36:06 +02:00
1f91acf6c1 moved click to optional requirments 2025-05-14 09:19:08 +02:00
c731fd3393 commented out print 2025-05-14 09:18:36 +02:00
65806ccbe3 now the metadata knows it's archive name 2025-05-14 09:18:11 +02:00
ae516b6c34 removed good and put is_code_related in selection instead 2025-05-14 09:18:06 +02:00
ea3a2b72e5 added output to manual_selection in order not to
overwrite every time
2025-05-14 09:16:31 +02:00
2f2cbae756 fixed typing of parameters for server version of
python
2025-05-12 11:57:30 +02:00
a701dc236c when selecting, we can now choose which diff hunk
is relevant and modify it if necessary
2025-05-07 10:39:21 +02:00
959184b2a8 we can now clean the dataset from useless entries 2025-05-07 10:38:41 +02:00
36b7dc5c02 added uuid when creating the dataset 2025-05-07 10:38:20 +02:00
af89051779 prompts can now have a default value when enter is
pressed
2025-05-07 10:35:27 +02:00
97646cb8c3 fixed log statements 2025-05-07 10:32:02 +02:00
40fa958cf8 added uuid as id 2025-05-07 10:18:38 +02:00
b3877733cb we can now generate the datasets to be served to users 2025-04-29 15:01:46 +02:00
bde9d45c10 implemented the manual selection script 2025-04-29 14:40:58 +02:00
03b89872dd added selection field to dataset for manual_selection 2025-04-28 09:51:38 +02:00