Skip to content
Snippets Groups Projects
Commit 0b0f08b5 authored by Frank Lange's avatar Frank Lange
Browse files

update README.md

parent 29926774
Branches PaperTest
No related tags found
No related merge requests found
Pipeline #1615426 passed
......@@ -2,21 +2,82 @@
## Using the Docker images
We provide reusable Docker images via GitLab's [container registry](https://git.rwth-aachen.de/dalia/backend/fuseki/container_registry).
Reusable Docker images are provided via GitLab's [container registry](https://git.rwth-aachen.de/dalia/backend/fuseki/container_registry).
### Maintained image tags
* `latest`: built from the `main` branch; used in [DALIA's production system](https://search.dalia.education)
* `dev`: build from the `dev` branch; used in DALIA's staging system
### To start a Fuseki instance
### Start a Fuseki instance
Without mounting the database directory as volume:
```shell
docker run --rm -it -p 3030:3030 --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest
docker run --rm -it -p 3030:3030 -v ${PWD}/data:/database --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest
```
creates and starts a Docker container and exposes Fuseki's HTTP interface on the Docker host via port 3030 (http://localhost:3030). Fuseki's database directory is mounted as volume under `./data`.
## Interfaces
### Volume: Fuseki's database directory
In the container [Fuseki's database directory](https://jena.apache.org/documentation/fuseki2/fuseki-layout.html#runtime-area----fuseki_base) including its configuration is located at `/database`. This directory should be mounted as volume to persist data beyond the life cycle of the container.
### Fuseki's HTTP interface
Fuseki exposes its [administration protocol](https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html) and the RDF dataset endpoints via HTTP on port 3030. The RDF dataset endpoints typically support [SPARQL queries](https://www.w3.org/TR/sparql11-query/), [SPARQL/Update](https://www.w3.org/TR/sparql11-update/) (SPARUL) and the [Graph Store HTTP Protocol](https://www.w3.org/TR/sparql11-http-rdf-update/) (GSP).
### Logging
Log messages are written exclusively through the `stdout` and `stderr` output streams, which are passed to [Docker's logging system](https://docs.docker.com/engine/logging/).
## RDF datasets
### _dalia_ dataset
This dataset contains the DALIA knowledge graph with information on learning resources and communities. It is initialized with RDF data from the `/dalia_data` directory with data initially provided by the [`download-dalia-data-job` CI job](https://git.rwth-aachen.de/dalia/backend/fuseki/-/blob/main/.gitlab-ci.yml?ref_type=heads#L5) before building the Docker image.
Endpoints:
* `/dalia/`: SPARQL, SPARUL and GSP (read+write)
* `/dalia/query`: SPARQL
* `/dalia/query`: SPARQL
* `/dalia/get`: GSP (read)
* `/dalia/update`: SPARUL
* `/dalia/data`: GSP (read+write)
Text indexing: See [issue #2](https://git.rwth-aachen.de/dalia/backend/fuseki/-/issues/2).
### _ontologies_ dataset
With mounting the database directory as volume:
This read-only dataset contains third-party ontologies and vocabularies that the items in the _dalia_ dataset refer to. It is initialized with RDF data from the `/ontologies` directory each time the Docker container starts.
Loaded ontologies and vocabularies: See [issue #1](https://git.rwth-aachen.de/dalia/backend/fuseki/-/issues/1).
Endpoints:
* `/ontologies/`: SPARQL and GSP (read)
* `/ontologies/query`: SPARQL
* `/ontologies/sparql`: SPARQL
* `/ontologies/get`: GSP (read)
Text indexing: See [issue #3](https://git.rwth-aachen.de/dalia/backend/fuseki/-/issues/3).
## Maintenance tasks
These tasks should be triggered frequently, for instance via cron jobs. The maintenance scripts should be called via `docker exec`.
### Dataset backup
Example:
```shell
docker run --rm -it -p 3030:3030 -v ${PWD}/data:/database --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest
docker exec fuseki /scripts/backup.py dalia
```
creates a backup of the _dalia_ dataset. The backup file in the format `dalia_YYYY-MM-DD_HH-MM-SS.nq.gz` can be found in `/database/backups`. It is a gzipped [N-Quads](https://en.wikipedia.org/wiki/N-Triples#N-Quads) file, which can be immediately loaded into a Fuseki dataset.
### Dataset compaction
Apache Jena's TDB2 datasets grow in size due to update operations (see [TDB FAQs](https://jena.apache.org/documentation/tdb/faqs.html#input-vs-database-size)), which makes compaction necessary.
Example:
```shell
docker exec fuseki /scripts/compact_dalia.sh
```
starts compaction of the _dalia_ dataset.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment