From 0b0f08b520434b05ffa3ea2dda98d00303b6ba62 Mon Sep 17 00:00:00 2001 From: flange <38500-flange@users.noreply.git.rwth-aachen.de> Date: Fri, 14 Feb 2025 16:31:35 +0100 Subject: [PATCH] update README.md --- README.md | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 67 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 5d65ec7..b67e1ed 100644 --- a/README.md +++ b/README.md @@ -2,21 +2,82 @@ ## Using the Docker images -We provide reusable Docker images via GitLab's [container registry](https://git.rwth-aachen.de/dalia/backend/fuseki/container_registry). +Reusable Docker images are provided via GitLab's [container registry](https://git.rwth-aachen.de/dalia/backend/fuseki/container_registry). ### Maintained image tags * `latest`: built from the `main` branch; used in [DALIA's production system](https://search.dalia.education) * `dev`: build from the `dev` branch; used in DALIA's staging system -### To start a Fuseki instance +### Start a Fuseki instance -Without mounting the database directory as volume: ```shell -docker run --rm -it -p 3030:3030 --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest +docker run --rm -it -p 3030:3030 -v ${PWD}/data:/database --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest ``` +creates and starts a Docker container and exposes Fuseki's HTTP interface on the Docker host via port 3030 (http://localhost:3030). Fuseki's database directory is mounted as volume under `./data`. + +## Interfaces + +### Volume: Fuseki's database directory + +In the container [Fuseki's database directory](https://jena.apache.org/documentation/fuseki2/fuseki-layout.html#runtime-area----fuseki_base) including its configuration is located at `/database`. This directory should be mounted as volume to persist data beyond the life cycle of the container. + +### Fuseki's HTTP interface + +Fuseki exposes its [administration protocol](https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html) and the RDF dataset endpoints via HTTP on port 3030. The RDF dataset endpoints typically support [SPARQL queries](https://www.w3.org/TR/sparql11-query/), [SPARQL/Update](https://www.w3.org/TR/sparql11-update/) (SPARUL) and the [Graph Store HTTP Protocol](https://www.w3.org/TR/sparql11-http-rdf-update/) (GSP). + +### Logging + +Log messages are written exclusively through the `stdout` and `stderr` output streams, which are passed to [Docker's logging system](https://docs.docker.com/engine/logging/). + +## RDF datasets + +### _dalia_ dataset + +This dataset contains the DALIA knowledge graph with information on learning resources and communities. It is initialized with RDF data from the `/dalia_data` directory with data initially provided by the [`download-dalia-data-job` CI job](https://git.rwth-aachen.de/dalia/backend/fuseki/-/blob/main/.gitlab-ci.yml?ref_type=heads#L5) before building the Docker image. + +Endpoints: +* `/dalia/`: SPARQL, SPARUL and GSP (read+write) +* `/dalia/query`: SPARQL +* `/dalia/query`: SPARQL +* `/dalia/get`: GSP (read) +* `/dalia/update`: SPARUL +* `/dalia/data`: GSP (read+write) + +Text indexing: See [issue #2](https://git.rwth-aachen.de/dalia/backend/fuseki/-/issues/2). + +### _ontologies_ dataset -With mounting the database directory as volume: +This read-only dataset contains third-party ontologies and vocabularies that the items in the _dalia_ dataset refer to. It is initialized with RDF data from the `/ontologies` directory each time the Docker container starts. + +Loaded ontologies and vocabularies: See [issue #1](https://git.rwth-aachen.de/dalia/backend/fuseki/-/issues/1). + +Endpoints: +* `/ontologies/`: SPARQL and GSP (read) +* `/ontologies/query`: SPARQL +* `/ontologies/sparql`: SPARQL +* `/ontologies/get`: GSP (read) + +Text indexing: See [issue #3](https://git.rwth-aachen.de/dalia/backend/fuseki/-/issues/3). + +## Maintenance tasks + +These tasks should be triggered frequently, for instance via cron jobs. The maintenance scripts should be called via `docker exec`. + +### Dataset backup + +Example: ```shell -docker run --rm -it -p 3030:3030 -v ${PWD}/data:/database --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest +docker exec fuseki /scripts/backup.py dalia +``` +creates a backup of the _dalia_ dataset. The backup file in the format `dalia_YYYY-MM-DD_HH-MM-SS.nq.gz` can be found in `/database/backups`. It is a gzipped [N-Quads](https://en.wikipedia.org/wiki/N-Triples#N-Quads) file, which can be immediately loaded into a Fuseki dataset. + +### Dataset compaction + +Apache Jena's TDB2 datasets grow in size due to update operations (see [TDB FAQs](https://jena.apache.org/documentation/tdb/faqs.html#input-vs-database-size)), which makes compaction necessary. + +Example: +```shell +docker exec fuseki /scripts/compact_dalia.sh ``` +starts compaction of the _dalia_ dataset. -- GitLab