Apache Jena Fuseki
DALIA's backend triplestore powered byUsing the Docker images
Reusable Docker images are provided via GitLab's container registry.
Maintained image tags
-
latest
: built from themain
branch; used in DALIA's production system -
dev
: build from thedev
branch; used in DALIA's staging system
Start a Fuseki instance
docker run --rm -it -p 3030:3030 -v ${PWD}/data:/database --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest
creates and starts a Docker container and exposes Fuseki's HTTP interface on the Docker host via port 3030 (http://localhost:3030). Fuseki's database directory is mounted as volume under ./data
.
Interfaces
Volume: Fuseki's database directory
In the container Fuseki's database directory including its configuration is located at /database
. This directory should be mounted as volume to persist data beyond the life cycle of the container.
Fuseki's HTTP interface
Fuseki exposes its administration protocol and the RDF dataset endpoints via HTTP on port 3030. The RDF dataset endpoints typically support SPARQL queries, SPARQL/Update (SPARUL) and the Graph Store HTTP Protocol (GSP).
Logging
Log messages are written exclusively through the stdout
and stderr
output streams, which are passed to Docker's logging system.
RDF datasets
Notes on text indexing
The datasets are configured to support Jena's full text search via Lucene indices. Considering triples that build an RDF graph, the string literal objects for certain predicates are indexed and made available for a text search. This means there is a text index for each predicate. Moreover, different indices can be combined at query time via a property list.
dalia dataset
This dataset contains the DALIA knowledge graph with information on learning resources and communities. It is initialized with RDF data from the /dalia_data
directory with data initially provided by the download-dalia-data-job
CI job before building the Docker image.
Endpoints:
-
/dalia/
: SPARQL, SPARUL and GSP (read+write) -
/dalia/query
: SPARQL -
/dalia/sparql
: SPARQL -
/dalia/get
: GSP (read) -
/dalia/update
: SPARUL -
/dalia/data
: GSP (read+write)
Text indexing:
- indexed predicates and index names:
Predicate | Index name | Use |
---|---|---|
dcterms:title |
dctermsTitle |
learning resource title, community title |
dcterms:description |
dctermsDescription |
learning resource description, community description |
fabio:hasSubtitle |
fabioHasSubtitle |
learning resource subtitle |
schema:keywords |
schemaKeywords |
learning resource keywords |
schema:name |
schemaName |
organization author name |
schema:familyName |
schemaFamilyName |
person author name |
schema:givenName |
schemaGivenName |
person author name |
- combined indices (property lists):
-
dt:learningResourceTexts
:dcterms:title
,dcterms:description
,fabio:hasSubtitle
,schema:keywords
,schema:name
,schema:familyName
,schema:givenName
-
dt:communityTexts
:dcterms:title
,dcterms:description
- namespace
dt
:http://dalia.education/text#
-
ontologies dataset
This read-only dataset contains third-party ontologies and vocabularies that the items in the dalia dataset refer to. It is initialized with RDF data from the /ontologies
directory each time the Docker container starts.
Loaded ontologies and vocabularies:
- MoDALIA Ontology (CC-BY 4.0)
- SPDX License List Data (license?)
- DINI AG KIM Hochschulfächersystematik (license?)
- DINI AG KIM Hochschulcampus Ressourcentypen (CC0 1.0)
- Lexvo language vocabulary (CC-BY-SA 3.0)
Endpoints:
-
/ontologies/
: SPARQL and GSP (read) -
/ontologies/query
: SPARQL -
/ontologies/sparql
: SPARQL -
/ontologies/get
: GSP (read)
Text indexing:
- indexed predicates and index names:
Predicate | Index name | Use |
---|---|---|
rdfs:label |
rdfsLabel |
labels |
skos-last-call:prefLabel 1
|
skosLastCallPrefLabel |
labels of languages (Lexvo) |
spdx:licenseId |
spdxLicenseId |
SPDX: license ID |
spdx:name |
spdxName |
SPDX: license full name |
spdx:licenseText |
spdxLicenseText |
SPDX: license text |
1 Namespace skos-last-call
is http://www.w3.org/2008/05/skos#
. It is different from the standard SKOS namespace.
- combined indices (property lists):
-
dt:spdxLicensesTexts
:spdx:licenseId
,spdx:name
,spdx:licenseText
- namespace
dt
:http://dalia.education/text#
-
Maintenance tasks
These tasks should be triggered frequently, for instance via cron jobs. The maintenance scripts should be called via docker exec
.
Dataset backup
Example:
docker exec fuseki /scripts/backup.py dalia
creates a backup of the dalia dataset. The backup file in the format dalia_YYYY-MM-DD_HH-MM-SS.nq.gz
can be found in /database/backups
. It is a gzipped N-Quads file, which can be immediately loaded into a Fuseki dataset.
Dataset compaction
Apache Jena's TDB2 datasets grow in size due to update operations (see TDB FAQs), which makes compaction necessary.
Example:
docker exec fuseki /scripts/compact_dalia.sh
starts compaction of the dalia dataset.