Commit 11498ee7 authored by Amrita's avatar Amrita
Browse files

Platform-Independant exam-scan update 09112020

parent bd089f9c
Pipeline #357902 canceled with stages
# Ubuntu 18.04 (bionic)
# https://hub.docker.com/_/ubuntu/?tab=tags&name=bionic
# OS/ARCH: linux/amd64
ARG ROOT_CONTAINER=ubuntu:bionic-20200403
ARG BASE_CONTAINER=$ROOT_CONTAINER
FROM $BASE_CONTAINER
LABEL maintainer="Christian Rohlfing <rohlfing@ient.rwth-aachen.de>"
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update \
&& apt-get install -yq --no-install-recommends \
wget \
poppler-utils \
imagemagick-6.q16 \
gsfonts \
img2pdf \
parallel \
qpdf \
pwgen \
zip \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
COPY ./*.sh /opt/scripts/
ENTRYPOINT [ "/opt/scripts/batch.sh" ]
......@@ -4,20 +4,17 @@ Preparing exam scans for ship out: Adding watermarks, encryption and preparing u
**Contents**
* `watermark.sh` watermarks each page of PDFs containing exam scans with matriculation number of the respective student
* `encrypt.sh` encrypts PDF with password
* `preparemoodle.sh` prepares for uploading PDFs to moodle via assign module as feedback file for each student
* `batch.sh` calls the three files above successively
* `watermark.py` watermarks each page of PDFs containing exam scans with matriculation number of the respective student
* `encrypt.py` encrypts PDF with password either with a common password(passed as an argument) or a randomly generated password(when there is no argument)
* `preparemoodle.py` prepares for uploading PDFs to moodle via assign module as feedback file for each student
Please note that the three scripts `watermark.sh`, `encrypt.sh`, and `preparemoodle.sh`do not depend on each other.
Please note that the three scripts `watermark.py`, `encrypt.py`, and `preparemoodle.py`do not depend on each other.
If you want to use only a subset (or one) of the scripts, you can find their package dependencies further down in the Installation section.
Exemplary outputs can be downloaded:
* [moodle_feedbacks.zip](https://git.rwth-aachen.de/IENT/exam-scan/-/jobs/artifacts/master/raw/out/moodle_feedbacks.zip?job=test): The zip-Archive to be uploaded to Moodle containing the watermarked and encrypted PDFs for each student.
* [passwords.csv](https://git.rwth-aachen.de/IENT/exam-scan/-/jobs/artifacts/master/raw/out/passwords.csv?job=test): CSV file containing passwords for each PDF.
Please note that we also provide a Dockerfile and a pre-built Docker image, see below.
## Quick start
### Prerequisites
......@@ -38,13 +35,42 @@ Exemplary outputs can be downloaded:
* Moodle will check for consistency and prompt errors.
### Install software dependencies
### Installation
We tested everything only under Ubuntu 18.04 (native or Windows Subsystem for Linux). We also provide a Dockerfile (see Section below).
Everything was tested on Windows10 (with Python 3.8.6), Ubuntu 20.04.1 LTS(with Python 3.8.5) and macOS 10.14 Mojave (Python 3.8)
```
sudo apt-get install poppler-utils imagemagick-6.q16 img2pdf parallel qpdf pwgen zip
```
**1. Appropriate versions(based on your processor and os) of the following softwares need to be downloaded and installed:**
- (For Mac users only) : Install Homebrew by typing the following command in Terminal: `/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"`
- Install ImageMagick : https://docs.wand-py.org/en/latest/guide/install.html
- Ghostscript :
For Windows : https://www.ghostscript.com/download/gsdnld.html (Agree to the License Agreement and keep all default settings during installation)
For Linux: Run the below commands in Terminal
```
apt-get update
apt-get install ghostscript
```
For macOS: Run the below commands in Terminal:
Install Ghostscript: `brew install ghostscript`
**2. Check if you have pip installed with the below command:**
`pip -V` or `pip3 -V` if not, install pip from: https://pip.pypa.io/en/stable/installing/
**3. Install modules with pip:**
`pip install wand pillow PyPDF2 pwgen pikepdf` or `pip3 install wand pillow PyPDF2 pwgen pikepdf`
**3.1 _Additional steps for Linux users:_**
Fix Imagemagick security bug:
- Navigate to the policy.xml file of ImageMagick with at /etc/ImageMagick-6/policy.xml
- Open the file and find `<policy domain="coder" rights="none" pattern="PDF" />`
- Replace it with `<policy domain="coder" rights="read|write" pattern="PDF" />`
- Save the file
Now everything should be set up.
......@@ -54,9 +80,16 @@ We assume that the folder `./pdfs` holds the scans of the exams.
The filename of each PDF should start with the matriculation number of the student, e.g. `./pdfs/123456_Lastname.pdf`.
```
./watermark.sh --in ./pdfs --out ./pdfs_watermarked --cores 2
python watermark.py --in ./pdfs --out ./pdfs_watermarked --cores 2
```
or
```
python3 watermark.py --in ./pdfs --out ./pdfs_watermarked --cores 2
```
Folder `pdfs_watermarked` contains watermarked PDFs, with each page watermarked with the matriculation number of the student.
### Encrypt
......@@ -64,9 +97,14 @@ Folder `pdfs_watermarked` contains watermarked PDFs, with each page watermarked
Use either a global password by specifying it with the `--password` option or per-student passwords by ommiting `--password`.
```
./encrypt.sh --in ./pdfs_watermarked --out ./pdfs_encrypted --password ganzgeheim
python encrypt.py --in ./pdfs_watermarked --out ./pdfs_encrypted --password ganzgeheim
```
or
```
python3 encrypt.py --in ./pdfs_watermarked --out ./pdfs_encrypted --password ganzgeheim
```
Folder `./pdfs_encrypted` contains all encrypted PDFs as well as `passwords.csv`, mapping the password of each PDF to the matriculation number.
### Prepare for Moodle batch upload
......@@ -78,7 +116,13 @@ This step prepares the PDFs for upload to Moodle. First, the grading table `Bewe
This step is needed since Moodle does not only need matriculation number, but also last and first name as well as an internal user id, which is stored in `Bewertungen.csv`.
```
./preparemoodle.sh --in ./pdfs_encrypted --csv ./Bewertungen.csv --out ./moodle_feedbacks.zip
python preparemoodle.py --in ./pdfs_encrypted --csv ./Bewertungen.csv --out ./moodle_feedbacks.zip
```
or
```
python3 preparemoodle.py --in ./pdfs_encrypted --csv ./Bewertungen.csv --out ./moodle_feedbacks.zip
```
Then, you can upload `moodle_feedbacks.zip` in Moodle:
......@@ -87,64 +131,41 @@ Then, you can upload `moodle_feedbacks.zip` in Moodle:
Further remarks:
* Exemplary zip archive `moodle_feedbacks.zip` can be downloaded [here](https://git.rwth-aachen.de/IENT/exam-scan/-/jobs/artifacts/master/download?job=test).
* You can also conduct a dry run (neither folders nor zip file are created) via `./preparemoodle.sh --dry [...]`
### Batch job
Or do everything in one step
```
./batch.sh --in ./pdfs --csv ./Bewertungen.csv --out ./out --password ganzgeheim --cores 2
python batch.py --in ./pdfs --out ./out --cores 2 --password ganzgeheim --csv ./Bewertungen.csv
```
with folder `out` containing `passwords.csv` and `moodle_feedbacks.zip`.
## Installation
Tested everything under Ubuntu 18.04 (both native and Windows Subsystem for Linux (WSL))
with folder `out` containing `passwords.csv` and `moodle_feedbacks.zip`.
### Dependencies
`watermark.sh`:
* poppler-utils (pdfimages)
* imagemagick-6.q16 (convert)
* img2pdf
* parallel
`watermark.py`
* ImageMagick 7.0.10-33
* Ghostscript 9.53.3
* Pillow
* Wand
* PyPDF2 1.26.0
`encrypt.sh`:
`encrypt.py`:
* qpdf
* PyPDF2 1.26.0
* pwgen
* pikepdf
`preparemoodle.sh`:
`preparemoodle.py`:
* zip
Under Ubuntu 18.04:
```
sudo apt-get install poppler-utils imagemagick-6.q16 img2pdf parallel qpdf pwgen zip
```
## Docker
Try it out with
```
docker run --name examscan --rm -it -v $(pwd):$(pwd) -w $(pwd) \
registry.git.rwth-aachen.de/ient/exam-scan \
--in in --out out --tmp tmp
```
or build it yourself
```
docker build --tag examscan .
docker run --name examscan --rm -it -v $(pwd):$(pwd) -w $(pwd) \
examscan --in in --out out --tmp tmp
```
## Original Authors
Helmut Flasche, Jens Schneider, Christian Rohlfing, IENT RWTH Aachen<br />
Dietmar Kahlen, ITHE RWTH Aachen
Dietmar Kahlen, ITHE RWTH Aachen<br />
Amrita Deb, IT Center, RWTH Aachen University (watermark.py, encrypt.py)
## Who do I talk to?
......
import os, sys,subprocess,time
import argparse
import shutil
if __name__ == '__main__':
#Parameters definition
parser = argparse.ArgumentParser(description='''
Watermark and encrypts exams and prepares everything for moodle upload.
Attention: contents of folder 'out' will be deleted in the beginning!
Options:
-h, --help show this help text
-i, --in input folder with PDFs. Default: ./pdfs
-c, --csv Moodle grading CSV file, needed to construct the folder names for moodle zip
-o, --out output folder containing passwords.csv and moodle_feedbacks.zip. Default: ./out
-p, --password sets global password. Default: empty, such that each PDF gets a custom password generated with 'pwgen'
-e, --cores number of cores for watermarking. Default: 1
-d, --dpi dpi parameter for conversion from pdf to images. Default: 250
-q, --quality quality parameter for jpeg. Default: 25
-t, --tmp tmp folder. Default: ./tmp
''')
parser.add_argument("-i", "--infolder", default="./pdfs",
help="Input folder with PDFs. Default: ./pdfs")
parser.add_argument("-c", "--csv", default="Bewertungen.csv",
help="Moodle grading CSV file, needed to construct the folder names for moodle zip")
parser.add_argument("-o", "--outfolder", default="./out",
help="output folder containing passwords.csv and moodle_feedbacks.zip. Default: ./out")
parser.add_argument("-e", "--cores", default="2",
help="Number of cores for parallel processing. Default: 2")
parser.add_argument("-p", "--password", default=" ",
help="sets global password. Default: empty, such that each PDF gets a custom password generated with 'pwgen'")
parser.add_argument("-d", "--dpi", default="250",
help="dpi parameter for conversion from pdf to images. Default: 250")
parser.add_argument("-t", "--tmp", default="./tmp",
help="tmp folder. Default: ./tmp/")
args = parser.parse_args()
infolder =args.infolder
csv = args.csv
outfolder = args.outfolder
cores = args.cores
dpi = args.dpi
tmp = args.tmp
starttime = time.time()
#Empty 'out' folder
for root, dirs, files in os.walk(outfolder):
for f in files:
os.unlink(os.path.join(root, f))
for d in dirs:
shutil.rmtree(os.path.join(root, d))
#Empty 'tmp' folder
for root, dirs, files in os.walk(tmp):
for f in files:
os.unlink(os.path.join(root, f))
for d in dirs:
shutil.rmtree(os.path.join(root, d))
# Watermarking process
watermark_outfolder = tmp+'/pdfs_watermarked'
if not os.path.exists(watermark_outfolder):
os.makedirs(watermark_outfolder)
subprocess.call((sys.executable, 'watermark.py', '--in', infolder,'--out', watermark_outfolder, '--cores', cores))
encrypt_outfolder = tmp+'/pdfs_encrypted'
if not os.path.exists(encrypt_outfolder):
os.makedirs(encrypt_outfolder)
if args.password==" ":
subprocess.call((sys.executable, 'encrypt.py', '--in', watermark_outfolder, '--out',encrypt_outfolder,'--passwordout',outfolder))
else:
subprocess.call((sys.executable, 'encrypt.py', '--in', watermark_outfolder, '--out',encrypt_outfolder,'--passwordout',outfolder,'--password',args.password))
#ZIP Archive preparation process
if not os.path.exists('./tmp/tmp'):
os.makedirs('./tmp/tmp')
subprocess.call((sys.executable, 'preparemoodle.py','--in',encrypt_outfolder,'--csv',csv,'--batch','1', '--tmp', './tmp/tmp', '--out',outfolder+'/moodle_feedbacks.zip'))
endtime = time.time()
print(f'\nTotal time taken: {endtime-starttime:.2f}s\n')
#!/bin/bash
#
# Author: Helmut Flasche <flasche@ient.rwth-aachen.de>, Christian Rohlfing <rohlfing@ient.rwth-aachen.de>
#{{{ Bash settings
set -o errexit # abort on nonzero exitstatus
set -o nounset # abort on unbound variable
set -o pipefail # don't hide errors within pipes
#}}}
#{{{ Input parameter handling
# Copied from https://stackoverflow.com/questions/192249/how-do-i-parse-command-line-arguments-in-bash
usage="$(basename "$0") [-h] [--in infolder] [-out outfolder] -- watermark and encrypts exams and prepares everything for moodle upload.
Attention: contents of folder 'out' will be deleted in the beginning!
Options:
-h, --help show this help text
-i, --in input folder with PDFs. Default: ./pdfs
-c, --csv Moodle grading CSV file, needed to construct the folder names for moodle zip
-o, --out output folder containing passwords.csv and moodle_feedbacks.zip. Default: ./out
-p, --password sets global password. Default: empty, such that each PDF gets a custom password generated with 'pwgen'
--cores number of cores for watermarking. Default: 1
-d, --dpi dpi parameter for conversion from pdf to images. Default: 250
-q, --quality quality parameter for jpeg. Default: 25
-t, --tmp tmp folder. Default: ./tmp"
# -allow a command to fail with !’s side effect on errexit
# -use return value from ${PIPESTATUS[0]}, because ! hosed $?
! getopt --test > /dev/null
if [[ ${PIPESTATUS[0]} -ne 4 ]]; then
echo 'I’m sorry, `getopt --test` failed in this environment.'
exit 1
fi
OPTIONS=hi:c:o:p:d:q:t:v
LONGOPTS=help,in:,csv:,out:,password:,cores:,dpi:,quality:,tmp:,verbose
# -regarding ! and PIPESTATUS see above
# -temporarily store output to be able to check for errors
# -activate quoting/enhanced mode (e.g. by writing out “--options”)
# -pass arguments only via -- "$@" to separate them correctly
! PARSED=$(getopt --options=$OPTIONS --longoptions=$LONGOPTS --name "$0" -- "$@")
if [[ ${PIPESTATUS[0]} -ne 0 ]]; then
# e.g. return value is 1
# then getopt has complained about wrong arguments to stdout
exit 2
fi
# read getopt’s output this way to handle the quoting right:
eval set -- "$PARSED"
infolder=- csv=- outfolder=- cores=- dpi=- quality=- tmpfolder=- v=n password=-
# now enjoy the options in order and nicely split until we see --
while true; do
case "$1" in
-h|--help)
echo "$usage"
exit
;;
-i|--in)
infolder="$2"
shift 2
;;
-c|--csv)
csv="$2"
shift 2
;;
-o|--out)
outfolder="$2"
shift 2
;;
-p|--password)
password="$2"
shift 2
;;
--cores)
cores="$2"
shift 2
;;
-d|--dpi)
dpi="$2"
shift 2
;;
-q|--quality)
quality="$2"
shift 2
;;
-t|--tmp)
tmpfolder="$2"
shift 2
;;
-v|--verbose)
v=y
shift
;;
--)
shift
break
;;
*)
echo "Programming error"
exit 3
;;
esac
done
# handle non-option arguments
#if [[ $# -ne 1 ]]; then
# echo "$0: A single input file is required."
# exit 4
#fi
# Default values
if [[ $infolder = "-" ]]; then
# global password not given, use distinct password for each pdf
infolder="./pdfs"
fi
if [[ $csv = "-" ]]; then
# global password not given, use distinct password for each pdf
csv="./Bewertungen.csv"
fi
if [[ $outfolder = "-" ]]; then
# global password not given, use distinct password for each pdf
outfolder="./out"
fi
if [[ $cores = "-" ]]; then
# global password not given, use distinct password for each pdf
cores=1
fi
if [[ $dpi = "-" ]]; then
# global password not given, use distinct password for each pdf
dpi=250
fi
if [[ $quality = "-" ]]; then
# global password not given, use distinct password for each pdf
quality=2
fi
if [[ $tmpfolder = "-" ]]; then
# global password not given, use distinct password for each pdf
tmpfolder="./tmp"
fi
# Check folders
for f in \
"$infolder" \
"$outfolder" \
"$tmpfolder"
do
if ! [ -d "$f" ]; then
echo "Folder $f does not exist. Exiting."
exit
fi
done
#}}}
# Delete old files
rm -rf "${tmpfolder}"/*
rm -rf "${outfolder}"/*
outfolder_watermarked="${tmpfolder}/out_watermarked"
outfolder_encrypted="${tmpfolder}/out_encrypted"
tmpfolder2="${tmpfolder}/tmp"
mkdir -p "${outfolder_watermarked}"
mkdir -p "${outfolder_encrypted}"
mkdir -p "${tmpfolder2}"
# Watermark
./watermark.sh --in "${infolder}" --out "${outfolder_watermarked}" --cores "${cores}"
# Encrypt
./encrypt.sh --in "${outfolder_watermarked}" --out "${outfolder_encrypted}" --password "${password}"
cp "${outfolder_encrypted}/passwords.csv" "${outfolder}/passwords.csv"
# Prepare moodle
./preparemoodle.sh --in "${outfolder_encrypted}" --out "${outfolder}/moodle_feedbacks.zip" --csv $csv --tmp "${tmpfolder2}"
echo
echo "Done."
echo
exit
##############################################################################
# #
# Script 2 of the Klasureinsicht for all platforms #
# Author: Amrita Deb <Deb@itc.rwth-aachen.de> #
# #
# This scripts creates encrypted copies of the watermarked PDFs created by #
# watermark.py as well as a csv files storing passwords for each file #
# #
##############################################################################
import pikepdf
import os, math, string, struct,random, csv, pwgen, sys, argparse, time
##############################################################################
# #
# Encryption function #
# #
# Input params: filepath of the watermarked pdfs, filepath of the #
# encrypted pdf, randomly generated 8-character password #
# Output: Encrypted file in pdfs_encrypted folder #
# #
##############################################################################
def encrypt(inFile, outFile, password):
pdf = pikepdf.Pdf.open(inFile)
pdf.save(outFile, encryption=pikepdf.Encryption(owner=password, user=password, R=4))
pdf.close()
##############################################################################
# #
# Main function #
# #
# 1) Lists all PDFs to be encrypted from ./pdfs_watermarked folder #
# 2) Encrypt pdf witn randomly generated 8 character long password #
# 3) Prepare a csv file that contains matriculation number and password #
# #
##############################################################################
if __name__ == '__main__':
#Defining parameters
parser = argparse.ArgumentParser(description='''
prepares batch upload to Moodle via assignment module.
PDFs in folder 'in' are moved to folder 'tmp' with a certain folder structure and finally zipped to 'out'.
Attention: zip-archive 'out' will be overwritten in the following!
''')
parser.add_argument("-i", "--infolder", default="./pdfs_watermarked",
help="Input folder with watermarked PDFs. Default: ./pdfs_watermarked")
parser.add_argument("-o", "--outfolder", default="./pdfs_encrypted",
help="Output folder of the encrypted PDFs Default: ./pdfs_encrypted")
parser.add_argument("-p", "--password", default=" ",
help="Common password for encrypted PDFs Default: '' will be changed to a 8character randomly generated password")
parser.add_argument("-w", "--passwordout", default=" ",
help="separate folder for the CSV file containing passwords. This required only for batch.py")
args = parser.parse_args()
infolder = args.infolder
outfolder = args.outfolder
#Empty 'out' folder
for root, dirs, files in os.walk(outfolder):
for f in files:
os.unlink(os.path.join(root, f))
for d in dirs:
shutil.rmtree(os.path.join(root, d))
#List all PDFs
pdf_folder = os.listdir(infolder)
pdf_files = [_ for _ in pdf_folder if _[-4:] == ".pdf"]
print("Available PDFs to be encrypted:\n")
for pdffile in pdf_files:
print(pdffile)
print('\n')
pwdfileinputs = []
starttime = time.time()
for i in pdf_files:
print('Encrypting '+i+' ...')
filename = i.split('.',1)[0]
matnum = filename.split('_',1)
if args.password==" ":
password = pwgen.pwgen(8)
else:
password = args.password
encrypt(infolder+'/'+i, outfolder+'/'+filename+'_encrypted.pdf', password)
pwdfileinputs.append(matnum[0]+', '+password)
print('Encryption completed for '+i+'\n')
endtime = time.time()
print('Recording password..')
if args.passwordout == " ":
with open(outfolder+'/passwords.csv', mode='w') as password_file:
pwdfile_writer = csv.writer(password_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for eachinput in pwdfileinputs:
pwdfile_writer.writerow([eachinput.split(',',1)[0],eachinput.split(',',1)[1]])
else:
with open(args.passwordout+'/passwords.csv', mode='w') as password_file:
pwdfile_writer = csv.writer(password_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for eachinput in pwdfileinputs:
pwdfile_writer.writerow([eachinput.split(',',1)[0],eachinput.split(',',1)[1]])
print('\nThe encrypted files are available in the folder'+ outfolder+'. \nThe passwords for the encrypted files are stored in passwords.csv in the '+args.passwordout+' folder\n')
print(f'\nTime taken: {endtime-starttime:.2f}s\n')
\ No newline at end of file
#!/bin/bash
#
# Author: Christian Rohlfing <rohlfing@ient.rwth-aachen.de>, Dietmar Kahlen <dk@ithe.rwth-aachen.de>
#{{{ Bash settings
set -o errexit # abort on nonzero exitstatus
set -o nounset # abort on unbound variable
set -o pipefail # don't hide errors within pipes
#}}}
#{{{ Input parameter handling
# Copied from https://stackoverflow.com/questions/192249/how-do-i-parse-command-line-arguments-in-bash
usage="$(basename "$0") [-h] [--in infolder] [-out outfolder] [-p n] -- encrypt exam scans in folder 'in' and puts them in folder 'out'.
Password for each student generated with 'pwgen' or given one is used for all students. Password(s) are stored in 'out'/passwords.csv
Attention: contents of folder 'out' will be overwritten in the following!
Options:
-h, --help show this help text
-i, --in input folder with PDFs. Default: ./pdfs_watermarked
-o, --out output folder. Default: ./pdfs_encrypted
-p, --password sets global password. Default empty, such that each PDF gets a custom password generated with 'pwgen'"
# -allow a command to fail with !’s side effect on errexit
# -use return value from ${PIPESTATUS[0]}, because ! hosed $?
! getopt --test > /dev/null
if [[ ${PIPESTATUS[0]} -ne 4 ]]; then
echo 'I’m sorry, `getopt --test` failed in this environment.'
exit 1
fi