update README.md

0b0f08b5 · Frank Lange · 29926774 · 0b0f08b5
Commit 0b0f08b5 authored 4 months ago by Frank Lange
--- a/README.md
+++ b/README.md
@@ -2,21 +2,82 @@

 ## Using the Docker images

-We provide reusable Docker images via GitLab's [container registry](https://git.rwth-aachen.de/dalia/backend/fuseki/container_registry).
+Reusable Docker images are provided via GitLab's [container registry](https://git.rwth-aachen.de/dalia/backend/fuseki/container_registry).

 ### Maintained image tags

 * `latest`: built from the `main` branch; used in [DALIA's production system](https://search.dalia.education)
 * `dev`: build from the `dev` branch; used in DALIA's staging system

-### To start a Fuseki instance
+### Start a Fuseki instance

-Without mounting the database directory as volume:
 ```shell
-docker run --rm -it -p 3030:3030 --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest
+docker run --rm -it -p 3030:3030 -v ${PWD}/data:/database --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest
 ```
+creates and starts a Docker container and exposes Fuseki's HTTP interface on the Docker host via port 3030 (http://localhost:3030). Fuseki's database directory is mounted as volume under `./data`.
+
+## Interfaces
+
+### Volume: Fuseki's database directory
+
+In the container [Fuseki's database directory](https://jena.apache.org/documentation/fuseki2/fuseki-layout.html#runtime-area----fuseki_base) including its configuration is located at `/database`. This directory should be mounted as volume to persist data beyond the life cycle of the container.
+
+### Fuseki's HTTP interface
+
+Fuseki exposes its [administration protocol](https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html) and the RDF dataset endpoints via HTTP on port 3030. The RDF dataset endpoints typically support [SPARQL queries](https://www.w3.org/TR/sparql11-query/), [SPARQL/Update](https://www.w3.org/TR/sparql11-update/) (SPARUL) and the [Graph Store HTTP Protocol](https://www.w3.org/TR/sparql11-http-rdf-update/) (GSP).
+
+### Logging
+
+Log messages are written exclusively through the `stdout` and `stderr` output streams, which are passed to [Docker's logging system](https://docs.docker.com/engine/logging/).
+
+## RDF datasets
+
+### _dalia_ dataset
+
+This dataset contains the DALIA knowledge graph with information on learning resources and communities. It is initialized with RDF data from the `/dalia_data` directory with data initially provided by the [`download-dalia-data-job` CI job](https://git.rwth-aachen.de/dalia/backend/fuseki/-/blob/main/.gitlab-ci.yml?ref_type=heads#L5) before building the Docker image.
+
+Endpoints:
+* `/dalia/`: SPARQL, SPARUL and GSP (read+write)
+* `/dalia/query`: SPARQL
+* `/dalia/query`: SPARQL
+* `/dalia/get`: GSP (read)
+* `/dalia/update`: SPARUL
+* `/dalia/data`: GSP (read+write)
+
+Text indexing: See [issue #2](https://git.rwth-aachen.de/dalia/backend/fuseki/-/issues/2).
+
+### _ontologies_ dataset

-With mounting the database directory as volume:
+This read-only dataset contains third-party ontologies and vocabularies that the items in the _dalia_ dataset refer to. It is initialized with RDF data from the `/ontologies` directory each time the Docker container starts.
+
+Loaded ontologies and vocabularies: See [issue #1](https://git.rwth-aachen.de/dalia/backend/fuseki/-/issues/1).
+
+Endpoints:
+* `/ontologies/`: SPARQL and GSP (read)
+* `/ontologies/query`: SPARQL
+* `/ontologies/sparql`: SPARQL
+* `/ontologies/get`: GSP (read)
+
+Text indexing: See [issue #3](https://git.rwth-aachen.de/dalia/backend/fuseki/-/issues/3).
+
+## Maintenance tasks
+
+These tasks should be triggered frequently, for instance via cron jobs. The maintenance scripts should be called via `docker exec`.
+
+### Dataset backup
+
+Example:
 ```shell
-docker run --rm -it -p 3030:3030 -v ${PWD}/data:/database --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest
+docker exec fuseki /scripts/backup.py dalia
+```
+creates a backup of the _dalia_ dataset. The backup file in the format `dalia_YYYY-MM-DD_HH-MM-SS.nq.gz` can be found in `/database/backups`. It is a gzipped [N-Quads](https://en.wikipedia.org/wiki/N-Triples#N-Quads) file, which can be immediately loaded into a Fuseki dataset.
+
+### Dataset compaction
+
+Apache Jena's TDB2 datasets grow in size due to update operations (see [TDB FAQs](https://jena.apache.org/documentation/tdb/faqs.html#input-vs-database-size)), which makes compaction necessary.
+
+Example:
+```shell
+docker exec fuseki /scripts/compact_dalia.sh
 ```
+starts compaction of the _dalia_ dataset.