Skip to content
Snippets Groups Projects
Select Git revision
  • dev
  • main default protected
2 results

fuseki

  • Open with
  • Download source code
  • flange's avatar
    Frank Lange authored
    1ff36eb9
    History

    DALIA's backend triplestore powered by Apache Jena Fuseki

    Using the Docker images

    Reusable Docker images are provided via GitLab's container registry.

    Maintained image tags

    • latest: built from the main branch; used in DALIA's production system
    • dev: built from the dev branch; used in DALIA's staging system

    Start a Fuseki instance

    docker run --rm -it -p 3030:3030 -v ${PWD}/data:/database --name fuseki registry.git.rwth-aachen.de/dalia/backend/fuseki:latest

    creates and starts a Docker container and exposes Fuseki's HTTP interface on the Docker host via port 3030 (http://localhost:3030). Fuseki's database directory is mounted as volume under ./data.

    Interfaces

    Volume: Fuseki's database directory

    In the container Fuseki's database directory including its configuration is located at /database. This directory should be mounted as volume to persist data beyond the life cycle of the container.

    Fuseki's HTTP interface

    Fuseki exposes its administration protocol and the RDF dataset endpoints via HTTP on port 3030. The RDF dataset endpoints typically support SPARQL queries, SPARQL/Update (SPARUL) and the Graph Store HTTP Protocol (GSP).

    Logging

    Log messages are written exclusively through the stdout and stderr output streams, which are passed to Docker's logging system.

    RDF datasets

    Notes on text indexing

    The datasets are configured to support Jena's full text search via Lucene indices. Considering triples that build an RDF graph, the string literal objects for certain predicates are indexed and made available for a text search. This means there is a text index for each predicate. Moreover, different indices can be combined at query time via a property list.

    dalia dataset

    This dataset contains the DALIA knowledge graph with information on learning resources and communities. It is initialized with RDF data from the /dalia_data directory with data initially provided by the download-dalia-data-job CI job before building the Docker image.

    Endpoints:

    • /dalia/: SPARQL, SPARUL and GSP (read+write)
    • /dalia/query: SPARQL
    • /dalia/sparql: SPARQL
    • /dalia/get: GSP (read)
    • /dalia/update: SPARUL
    • /dalia/data: GSP (read+write)

    Text indexing:

    • indexed predicates and index names:
    Predicate Index name Use
    dcterms:title dctermsTitle learning resource title, community title
    dcterms:description dctermsDescription learning resource description, community description
    fabio:hasSubtitle fabioHasSubtitle learning resource subtitle
    schema:keywords schemaKeywords learning resource keywords
    schema:name schemaName organization author name
    schema:familyName schemaFamilyName person author name
    schema:givenName schemaGivenName person author name
    • combined indices (property lists):
      • dt:learningResourceTexts: dcterms:title, dcterms:description, fabio:hasSubtitle, schema:keywords, schema:name, schema:familyName, schema:givenName
      • dt:communityTexts: dcterms:title, dcterms:description
      • namespace dt: http://dalia.education/text#

    ontologies dataset

    This read-only dataset contains third-party ontologies and vocabularies that the items in the dalia dataset refer to. It is initialized with RDF data from the /ontologies directory each time the Docker container starts.

    Loaded ontologies and vocabularies:

    Endpoints:

    • /ontologies/: SPARQL and GSP (read)
    • /ontologies/query: SPARQL
    • /ontologies/sparql: SPARQL
    • /ontologies/get: GSP (read)

    Text indexing:

    • indexed predicates and index names:
    Predicate Index name Use
    rdfs:label rdfsLabel labels
    skos-last-call:prefLabel1 skosLastCallPrefLabel labels of languages (Lexvo)
    spdx:licenseId spdxLicenseId SPDX: license ID
    spdx:name spdxName SPDX: license full name
    spdx:licenseText spdxLicenseText SPDX: license text

    1 Namespace skos-last-call is http://www.w3.org/2008/05/skos#. It is different from the standard SKOS namespace.

    • combined indices (property lists):
      • dt:spdxLicensesTexts: spdx:licenseId, spdx:name, spdx:licenseText
      • namespace dt: http://dalia.education/text#

    Maintenance tasks

    These tasks should be triggered frequently, for instance via cron jobs. The maintenance scripts should be called via docker exec.

    Dataset backup

    Example:

    docker exec fuseki /scripts/backup.py dalia

    creates a backup of the dalia dataset. The backup file in the format dalia_YYYY-MM-DD_HH-MM-SS.nq.gz can be found in /database/backups. It is a gzipped N-Quads file, which can be immediately loaded into a Fuseki dataset.

    Dataset compaction

    Apache Jena's TDB2 datasets grow in size due to update operations (see TDB FAQs), which makes compaction necessary.

    Example:

    docker exec fuseki /scripts/compact_dalia.sh

    starts compaction of the dalia dataset.