|
429fe9b060
|
implemented new way to extract stats from dataset
|
2025-06-10 20:42:58 +02:00 |
|
|
dd52e43000
|
added way to put paraphrases from external csv
|
2025-06-10 20:42:55 +02:00 |
|
|
1754f93018
|
quality of life for manual selection
|
2025-06-05 10:49:05 +02:00 |
|
|
4c5e486ad6
|
added small .unique() on repo names to avoid
processing a repo twice
|
2025-06-05 10:46:21 +02:00 |
|
|
bf1591c61d
|
fixed bug in manual selection
|
2025-06-04 12:07:41 +02:00 |
|
|
9a24a734e7
|
added the printing of the relevant hunk when
asking for comment relevance
|
2025-06-04 10:37:48 +02:00 |
|
|
6110640a6f
|
fixed condition to check whether a comment was
within the diffs
|
2025-06-03 13:39:02 +02:00 |
|
|
792195e33c
|
now ensuring the comment is within the diff_before
changes
|
2025-06-03 11:51:38 +02:00 |
|
|
926d3a3681
|
now using original start line as default and start
line as backup instead of the other way around
|
2025-06-03 11:51:07 +02:00 |
|
|
154837827d
|
added link to paraphrases extraction
|
2025-06-03 10:10:57 +02:00 |
|
|
45a8122408
|
using enum choice actoin instead of the previous
thing we were using
|
2025-06-03 10:10:36 +02:00 |
|
|
66d046cbaa
|
made filename a positional argument
|
2025-06-03 10:10:19 +02:00 |
|
|
c05c9cb366
|
fixed manual selection
|
2025-06-02 15:33:23 +02:00 |
|
|
87b49b377d
|
the removal of the is_code_related in field in
selection broke backwards compatilibility. Fixed
it
|
2025-06-02 10:46:47 +02:00 |
|
|
4648ba2560
|
fixed type annotation
|
2025-06-02 09:50:09 +02:00 |
|
|
09df9a1ae8
|
removed already done TODO
|
2025-06-02 09:49:59 +02:00 |
|
|
5b8357567b
|
removed code relatedness from manual selection
since now it's already done by pull_requests
|
2025-06-02 09:48:27 +02:00 |
|
|
b311c49f9a
|
simplifying logging of common error we can't do
much about
|
2025-05-28 10:17:59 +02:00 |
|
|
77ed66ded8
|
added small logging statement
|
2025-05-28 10:17:48 +02:00 |
|
|
e097885e36
|
only writing the dataset to disk when there are new entries
|
2025-05-28 10:17:19 +02:00 |
|
|
0b182837c1
|
added new option for dataset
|
2025-05-27 10:50:10 +02:00 |
|
|
900003bac7
|
added a way to extract the information to then
generate paraphrases
|
2025-05-27 10:48:17 +02:00 |
|
|
63b69e40b8
|
tried to make requests cache better
|
2025-05-26 11:36:31 +02:00 |
|
|
a4ce620aa0
|
printing stacktrace when error is made
|
2025-05-26 11:36:04 +02:00 |
|
|
7e00656ab1
|
fixed condition
|
2025-05-26 11:35:54 +02:00 |
|
|
e619d2f339
|
added another point of failure
|
2025-05-21 10:59:07 +02:00 |
|
|
5734ca5c8d
|
if we are multithreading, give some time between the requests
|
2025-05-21 09:40:30 +02:00 |
|
|
b598e97bc6
|
moved update of pbar
|
2025-05-21 09:33:44 +02:00 |
|
|
9ced42b6c4
|
removed unused variable
|
2025-05-21 09:33:31 +02:00 |
|
|
f1e8b896bb
|
fixed slight bug
|
2025-05-21 09:18:51 +02:00 |
|
|
a8ccf081a2
|
formatted file
|
2025-05-21 09:18:33 +02:00 |
|
|
d48c5d04b8
|
removed stat that has become useless
|
2025-05-20 16:50:21 +02:00 |
|
|
33cea7bbb4
|
added cute little units for the progress bars
|
2025-05-20 16:50:04 +02:00 |
|
|
ea7b510926
|
added another way pr can be invalid, if they have
no lines for their comment (github api be wierd)
|
2025-05-20 16:44:02 +02:00 |
|
|
09ee7995ff
|
made more general function to move logging to file
|
2025-05-20 16:40:26 +02:00 |
|
|
c577b3a6e5
|
saving all the results after any execption
|
2025-05-20 09:59:51 +02:00 |
|
|
e6c5c8df82
|
sorting the values descending, to have the top
most of the given column first
|
2025-05-20 09:59:09 +02:00 |
|
|
975b25f2f6
|
removed print statements
|
2025-05-20 09:58:55 +02:00 |
|
|
a3a89bb346
|
moved some logic around
|
2025-05-20 09:58:49 +02:00 |
|
|
b0443cc87f
|
added exclusion list
|
2025-05-20 09:57:53 +02:00 |
|
|
04a37030f4
|
populating cache only if there is any cache
|
2025-05-20 09:57:29 +02:00 |
|
|
15ffe67b0e
|
added fields to the metadata to make manual
filtering easier
|
2025-05-20 09:56:54 +02:00 |
|
|
3ffbb229b8
|
fixed typo
|
2025-05-20 09:50:00 +02:00 |
|
|
6e0aca2ad5
|
fixed typo
|
2025-05-20 09:48:31 +02:00 |
|
|
4d9c47f33a
|
added all the progress bars for each worker
|
2025-05-17 10:45:57 +02:00 |
|
|
98db478b7b
|
first draft of parallelization (NOT TESTED YET)
|
2025-05-17 09:42:02 +02:00 |
|
|
25072ac8b3
|
removed unused import
|
2025-05-17 09:37:33 +02:00 |
|
|
b90111c652
|
now i'm not crashing when no GITHUB_API_TOKEN is
given, rather just printing a warning
|
2025-05-17 09:37:31 +02:00 |
|
|
970ee1c363
|
added the possibility of sorting the incoming csv
by a certain column, now taking any csv instead of
the result of clone_repos.py
|
2025-05-17 09:32:40 +02:00 |
|
|
3ea3e980bd
|
added metavar names for arguments
|
2025-05-17 09:16:52 +02:00 |
|