Skip to content
Snippets Groups Projects
Commit 3aa18f44 authored by Christian Rohlfing's avatar Christian Rohlfing
Browse files

- added parallel for loop in watermark.sh

- added quality parameter in watermark.sh
parent 4a1b374e
No related branches found
No related tags found
No related merge requests found
......@@ -15,6 +15,7 @@ RUN apt-get update \
imagemagick-6.q16 \
gsfonts \
img2pdf \
parallel \
qpdf \
pwgen \
zip \
......
......@@ -9,7 +9,11 @@ Contents
* `preparemoodle.sh` prepares for uploading PDFs to moodle via assign module as feedback file for each student
* `batch.sh` calls the three files above successively
Exemplary output of `batch.sh` can be downloaded [here](https://git.rwth-aachen.de/IENT/exam-scan/-/jobs/artifacts/master/download?job=test). Please note that we also provide a Dockerfile and a pre-built Docker image, see below.
Exemplary outputs can be downloaded:
* [moodle_feedbacks.zip](https://git.rwth-aachen.de/IENT/exam-scan/-/jobs/artifacts/master/raw/out/moodle_feedbacks.zip?job=test): The zip-Archive to be uploaded to Moodle containing the watermarked and encrypted PDFs for each student.
* [passwords.csv](https://git.rwth-aachen.de/IENT/exam-scan/-/jobs/artifacts/master/raw/out/passwords.csv?job=test): CSV file containing passwords for each PDF.
Please note that we also provide a Dockerfile and a pre-built Docker image, see below.
## Quick start
......@@ -36,7 +40,7 @@ Exemplary output of `batch.sh` can be downloaded [here](https://git.rwth-aachen.
We tested everything only under Ubuntu 18.04 (native or Windows Subsystem for Linux). We also provide a Dockerfile (see Section below).
```
sudo apt-get install poppler-utils imagemagick-6.q16 img2pdf qpdf pwgen zip
sudo apt-get install poppler-utils imagemagick-6.q16 img2pdf parallel qpdf pwgen zip
```
Now everything should be set up.
......@@ -47,7 +51,7 @@ We assume that the folder `./pdfs` holds the scans of the exams.
The filename of each PDF should start with the matriculation number of the student, e.g. `./pdfs/123456_Lastname.pdf`.
```
./watermark.sh --in ./pdfs --out ./pdfs_watermarked
./watermark.sh --in ./pdfs --out ./pdfs_watermarked --cores 2
```
Folder `pdfs_watermarked` contains watermarked PDFs, with each page watermarked with the matriculation number of the student.
......@@ -78,7 +82,7 @@ Exemplary `moodle_feedbacks.zip` can be downloaded [here](https://git.rwth-aache
Or do everything in one step
```
./batch.sh --in ./pdfs --csv ./Bewertungen.csv --out ./out --password ganzgeheim
./batch.sh --in ./pdfs --csv ./Bewertungen.csv --out ./out --password ganzgeheim --cores 2
```
with folder `out` containing `passwords.csv` and `moodle_feedbacks.zip`.
......@@ -94,6 +98,7 @@ Tested everything under Ubuntu 18.04 (both native and Windows Subsystem for Linu
* poppler-utils (pdfimages)
* imagemagick-6.q16 (convert)
* img2pdf
* parallel
`encrypt.sh`:
......@@ -107,7 +112,7 @@ Tested everything under Ubuntu 18.04 (both native and Windows Subsystem for Linu
Under Ubuntu 18.04:
```
sudo apt-get install poppler-utils imagemagick-6.q16 img2pdf qpdf pwgen zip
sudo apt-get install poppler-utils imagemagick-6.q16 img2pdf parallel qpdf pwgen zip
```
## Docker
......
......@@ -19,7 +19,9 @@ Options:
-i, --in input folder with PDFs. Default: ./pdfs
-c, --csv Moodle grading CSV file, needed to construct the folder names for moodle zip
-o, --out output folder containing passwords.csv and moodle_feedbacks.zip. Default: ./out
-p, --password sets global password. Default empty, such that each PDF gets a custom password generated with 'pwgen'
-p, --password sets global password. Default: empty, such that each PDF gets a custom password generated with 'pwgen'
--cores number of cores for watermarking. Default: 1
-q, --quality quality parameter for jpeg. Default: 25
-t, --tmp tmp folder. Default: ./tmp"
# -allow a command to fail with !’s side effect on errexit
......@@ -30,8 +32,8 @@ if [[ ${PIPESTATUS[0]} -ne 4 ]]; then
exit 1
fi
OPTIONS=hi:c:o:p:t:v
LONGOPTS=help,in:,csv:,out:,password:,tmp:,verbose
OPTIONS=hi:c:o:p:q:t:v
LONGOPTS=help,in:,csv:,out:,password:,cores:,quality:,tmp:,verbose
# -regarding ! and PIPESTATUS see above
# -temporarily store output to be able to check for errors
......@@ -46,7 +48,7 @@ fi
# read getopt’s output this way to handle the quoting right:
eval set -- "$PARSED"
infolder=- csv=- outfolder=- tmpfolder=- v=n password=-
infolder=- csv=- outfolder=- cores=- quality=- tmpfolder=- v=n password=-
# now enjoy the options in order and nicely split until we see --
while true; do
case "$1" in
......@@ -70,6 +72,14 @@ while true; do
password="$2"
shift 2
;;
--cores)
cores="$2"
shift 2
;;
-q|--quality)
quality="$2"
shift 2
;;
-t|--tmp)
tmpfolder="$2"
shift 2
......@@ -111,6 +121,16 @@ if [[ $outfolder = "-" ]]; then
outfolder="./out"
fi
if [[ $cores = "-" ]]; then
# global password not given, use distinct password for each pdf
cores=1
fi
if [[ $quality = "-" ]]; then
# global password not given, use distinct password for each pdf
quality=2
fi
if [[ $tmpfolder = "-" ]]; then
# global password not given, use distinct password for each pdf
tmpfolder="./tmp"
......@@ -143,7 +163,7 @@ mkdir -p "${tmpfolder2}"
# Watermark
./watermark.sh --in "${infolder}" --out "${outfolder_watermarked}" --tmp "${tmpfolder2}"
./watermark.sh --in "${infolder}" --out "${outfolder_watermarked}" --cores "${cores}"
# Encrypt
./encrypt.sh --in "${outfolder_watermarked}" --out "${outfolder_encrypted}" --password "${password}"
......
......@@ -11,14 +11,15 @@ set -o pipefail # don't hide errors within pipes
#{{{ Input parameter handling
# Copied from https://stackoverflow.com/questions/192249/how-do-i-parse-command-line-arguments-in-bash
usage="$(basename "$0") [-h] [--in infolder] [-out outfolder] -- watermark exam scans with matriculation number in folder 'in' and puts them in folder 'out'.
usage="$(basename "$0") [-h] [--in infolder] [--out outfolder] [--cores numcores] [--quality jpegquality] -- watermark exam scans with matriculation number in folder 'in' and puts them in folder 'out'.
Attention: contents of folder 'out' will be overwritten in the following!
Options:
-h, --help show this help text
-i, --in input folder with PDFs. Default: ./pdfs
-o, --out output folder. Default: ./pdfs_watermarked
-t, --tmp tmp folder. Default: ./tmp"
-c, --cores number of cores for parallel processing. Default: 1
-q, --quality quality parameter for jpeg. Default: 25"
# -allow a command to fail with !’s side effect on errexit
# -use return value from ${PIPESTATUS[0]}, because ! hosed $?
......@@ -28,8 +29,8 @@ if [[ ${PIPESTATUS[0]} -ne 4 ]]; then
exit 1
fi
OPTIONS=hiotv
LONGOPTS=help,in:,out:,tmp:,verbose
OPTIONS=hiocqv
LONGOPTS=help,in:,out:,cores:,quality:,verbose
# -regarding ! and PIPESTATUS see above
# -temporarily store output to be able to check for errors
......@@ -44,7 +45,7 @@ fi
# read getopt’s output this way to handle the quoting right:
eval set -- "$PARSED"
infolder=- outfolder=- tmpfolder=- v=n
infolder=- outfolder=- cores=- quality=- v=n
# now enjoy the options in order and nicely split until we see --
while true; do
case "$1" in
......@@ -60,8 +61,12 @@ while true; do
outfolder="$2"
shift 2
;;
-t|--tmp)
tmpfolder="$2"
-c|--cores)
cores="$2"
shift 2
;;
-q|--quality)
quality="$2"
shift 2
;;
-v|--verbose)
......@@ -96,16 +101,22 @@ if [[ $outfolder = "-" ]]; then
outfolder="./pdfs_watermarked"
fi
if [[ $tmpfolder = "-" ]]; then
if [[ $cores = "-" ]]; then
# global password not given, use distinct password for each pdf
tmpfolder="./tmp"
cores=1
fi
if [[ $quality = "-" ]]; then
# global password not given, use distinct password for each pdf
quality=2
fi
# Check folders
for f in \
"$infolder" \
"$outfolder" \
"$tmpfolder"
"$outfolder"
do
if ! [ -d "$f" ]; then
echo "Folder $f does not exist. Exiting."
......@@ -128,8 +139,15 @@ ls -a1 "${infolder}"/*.pdf
echo
# Loop over all PDFs
for longinpdf in "${infolder}"/*.pdf
do
#for longinpdf in "${infolder}"/*.pdf
#do
export SHELL=$(type -p bash)
doit () {
longinpdf=$1
outfolder=$2
quality=$3
lauthor=$4
# Get matriculation number from file
inpdf=$(basename "${longinpdf%.*}") # file name without folder and extension
matnum=${inpdf:0:6} # read in first 6 letters
......@@ -142,10 +160,12 @@ do
echo "Handle ${longinpdf}"
# remove temporary files
rm -rf "${tmpfolder}"/*
#rm -rf "${tmpfolder}"/*
tmpfolder=$(mktemp -d)
echo "Working in tmp folder ${tmpfolder}"
# if already watermarked
if [[ $matnum = "waterm" ]]
if [[ $matnum = "waterm" ]];
then
echo
echo "Warning: ${longinpdf} is already watermarked"
......@@ -180,19 +200,30 @@ do
-compose over -composite "${tmpfolder}"/watermark_$page.png
# rotate and compress to jpg
convert -rotate -90 -quality 25 "${tmpfolder}"/watermark_$page.png "${tmpfolder}"/rotate_watermark_$page.jpg
convert -rotate -90 -quality "${quality}" "${tmpfolder}"/watermark_$page.png "${tmpfolder}"/rotate_watermark_$page.jpg
done
# Convert all images into pdf
echo
echo "Convert all images to pdf"
longoutpdf="${outfolder}"/"${inpdf}"_w.pdf
img2pdf "${tmpfolder}"/rotate*.jpg --output "${longoutpdf}" --pagesize A4 --author "${author}" --title "${matnum}"
img2pdf "${tmpfolder}"/rotate*.jpg --output "${longoutpdf}" --pagesize A4 --author "${lauthor}" --title "${matnum}"
# remove temporary files
rm -rf "${tmpfolder}"/*
rm -rf "${tmpfolder}"
} #done # end of PDF loop
if [[ "${cores}" -gt "1" ]]; then
echo "Parallel execution with ${cores} cores from now on."
export -f doit
parallel -j "${cores}" doit ::: "${infolder}"/*.pdf ::: "${outfolder}" ::: "${quality}" ::: "${author}"
else
for longinpdf in "${infolder}"/*.pdf
do
doit "${longinpdf}" "${outfolder}" "${quality}" "${author}"
done # end of PDF loop
fi
echo
echo "Done."
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment