125 Commits

Author SHA1 Message Date
4c5e486ad6 added small .unique() on repo names to avoid
processing a repo twice
2025-06-05 10:46:21 +02:00
6110640a6f fixed condition to check whether a comment was
within the diffs
2025-06-03 13:39:02 +02:00
792195e33c now ensuring the comment is within the diff_before
changes
2025-06-03 11:51:38 +02:00
926d3a3681 now using original start line as default and start
line as backup instead of the other way around
2025-06-03 11:51:07 +02:00
b311c49f9a simplifying logging of common error we can't do
much about
2025-05-28 10:17:59 +02:00
77ed66ded8 added small logging statement 2025-05-28 10:17:48 +02:00
e097885e36 only writing the dataset to disk when there are new entries 2025-05-28 10:17:19 +02:00
63b69e40b8 tried to make requests cache better 2025-05-26 11:36:31 +02:00
a4ce620aa0 printing stacktrace when error is made 2025-05-26 11:36:04 +02:00
7e00656ab1 fixed condition 2025-05-26 11:35:54 +02:00
e619d2f339 added another point of failure 2025-05-21 10:59:07 +02:00
5734ca5c8d if we are multithreading, give some time between the requests 2025-05-21 09:40:30 +02:00
b598e97bc6 moved update of pbar 2025-05-21 09:33:44 +02:00
9ced42b6c4 removed unused variable 2025-05-21 09:33:31 +02:00
d48c5d04b8 removed stat that has become useless 2025-05-20 16:50:21 +02:00
33cea7bbb4 added cute little units for the progress bars 2025-05-20 16:50:04 +02:00
ea7b510926 added another way pr can be invalid, if they have
no lines for their comment (github api be wierd)
2025-05-20 16:44:02 +02:00
09ee7995ff made more general function to move logging to file 2025-05-20 16:40:26 +02:00
c577b3a6e5 saving all the results after any execption 2025-05-20 09:59:51 +02:00
e6c5c8df82 sorting the values descending, to have the top
most of the given column first
2025-05-20 09:59:09 +02:00
975b25f2f6 removed print statements 2025-05-20 09:58:55 +02:00
a3a89bb346 moved some logic around 2025-05-20 09:58:49 +02:00
b0443cc87f added exclusion list 2025-05-20 09:57:53 +02:00
04a37030f4 populating cache only if there is any cache 2025-05-20 09:57:29 +02:00
15ffe67b0e added fields to the metadata to make manual
filtering easier
2025-05-20 09:56:54 +02:00
4d9c47f33a added all the progress bars for each worker 2025-05-17 10:45:57 +02:00
98db478b7b first draft of parallelization (NOT TESTED YET) 2025-05-17 09:42:02 +02:00
25072ac8b3 removed unused import 2025-05-17 09:37:33 +02:00
b90111c652 now i'm not crashing when no GITHUB_API_TOKEN is
given, rather just printing a warning
2025-05-17 09:37:31 +02:00
970ee1c363 added the possibility of sorting the incoming csv
by a certain column, now taking any csv instead of
the result of clone_repos.py
2025-05-17 09:32:40 +02:00
3ea3e980bd added metavar names for arguments 2025-05-17 09:16:52 +02:00
14e64984c5 instead of adding the cache as we go through the
repos, just add it before any processing, so we are sure to keep all the previously saved data
2025-05-16 19:39:59 +02:00
acce738872 made the requests never expire 2025-05-14 09:39:55 +02:00
0b02518374 found out that some couldn't checkout due to
conflicts, but if you force it, it works
2025-05-14 09:36:12 +02:00
ccd962c205 now using the metadata to get archive name 2025-05-14 09:36:06 +02:00
36b7dc5c02 added uuid when creating the dataset 2025-05-07 10:38:20 +02:00
e9816d4492 removed unused imports 2025-04-05 16:01:28 +02:00
bf8869e66c was accidentally copying over prs that were cached
twice
2025-04-01 15:45:23 +02:00
d4dd72469e instead of creating a list of the comments, using
the paginated list and totalCount
2025-04-01 14:46:42 +02:00
e2f313a62a made better argparse things 2025-04-01 12:15:24 +02:00
12b98bf1ef removed the throttle of pygithub to make requests
faster
2025-04-01 11:45:43 +02:00
6d28d89873 added return guard to remove indent level 2025-04-01 11:01:06 +02:00
bc71a21c30 instead of leaving reason_for_failure empty for
valid PRs, I now put that it's valid (even tho it's not a reason for _failure_ techinally, gne gne gne...)
2025-04-01 11:00:11 +02:00
a362aba344 added a simple caching of the requests to make it
much quicker to fail and restart
2025-04-01 10:14:45 +02:00
a24ffa00fc made help message shorter 2025-04-01 10:14:19 +02:00
b9d1923bd8 since the comment file might not be in the PR
files (since it was reverted back to its original state, we manually need to check if it's code related)
2025-04-01 09:53:26 +02:00
f7d70eed6c fixed how we get the diffs before (it was wrong),
extracted the way to get the last commit before the comments
2025-04-01 09:52:57 +02:00
af4fbaa7f3 added the type of the error in the print, because
some errors are not very verbose in what's going wrong
2025-04-01 09:48:24 +02:00
c31686ad63 not compiling, testing, etc. for files that are
not code related
2025-04-01 09:20:37 +02:00
0b238db879 fixed the name of the archive 2025-04-01 09:20:26 +02:00