A quick bash script that wraps around tesseract and allows tesseract to work on pdfs of scans
Go to file
2021-05-27 16:43:36 +02:00
LICENSE Initial commit 2021-05-21 20:22:23 +02:00
README.md Made the installation section of the readme a bit more robust 2021-05-27 16:43:36 +02:00
scantopdf Updated the usage function, to be more explicit 2021-05-27 16:07:34 +02:00

scantopdf

A quick bash script that wraps around tesseract and allows tesseract to work on pdfs of scans

Installation

To install this script, just paste these commands in the terminal

git clone https://git.karma-riuk.com/karma/scantopdf /tmp/scantopdf
sudo cp /tmp/scantopdf/scantopdf /usr/local/bin/scantopdf
sudo chmod 755 /usr/local/bin/scantopdf

Please verify that the path /usr/local/bin is in the global variable $PATH. To verify this, issue the following command:

echo "$PATH" | grep "/usr/local/bin"

if there is a line of output, you are all good! If there isn't, then issue the following command:

export PATH="/usr/local/bin:$PATH"
echo "export PATH=\"/usr/local/bin:\$PATH\"" | sudo tee -a /etc/profile

and everything should be okay.

Ensure successful installation

To make sure the installation has been done correctly, the following command

which scantopdf

should have the following output

/usr/local/bin/scantopdf

if it doesn't, try to restart the installation from scratch.

Usage

To use this script, follow this steps

  1. Open a terminal
  2. Go the location of the file you want to "convert" with the following command
    cd <path>
    
    where <path> is the location of the folder where lies the file to convert.
  3. Use the command scantopdf and give it, as an argument, the file you want to convert. If the file name contains spaces, surround it with quotes "
    scantopdf "I Promessi Sposi.pdf"
    
    Optionally you can give it the -v flag (before the name of the file), to make the script verbose, so instead of doing its job without saying anything, it will print each step that it's doing.
    scantopdf -v "La Divina Commedia.pdf"