SemanticSimilarity
This repository contains the code of the semantic similarity implementation. The idea is that multiple similarity implementations can be used and compared against each other.
A server implementation exists, to provide it as a service, so that it can be used in other areas.
The analysis data can be found in the Analysis
folder.
Installation
Run:
pip install -r requirements.txt
Server
For running the SemanticSimilarity as a server, just execute the server.py
with Python.
python server.py
Set the environment variable SEMANTICSIMILARITYPORT
if you want to specify the port and SEMANTICSIMILARITYHOST
if you want to specify the host.
Build
This repository comes with build artifacts which contains an executable that starts the server.
Example Usage
Navigate to the application after it runs and check out the Swagger interface!
Example Request
After the server has been set up and is running, a post request with this JSON data will yield results (comparing two SHACL application profiles):
{
"graphs": [
{
"definition": "@prefix dcmitype: <http://purl.org/dc/dcmitype/> .\n@prefix dcterms: <http://purl.org/dc/terms/> .\n@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n@prefix sh: <http://www.w3.org/ns/shacl#> .\n@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n@prefix aps: <https://purl.org/coscine/ap/> .\n\n<https://purl.org/coscine/ap/radar#subject> sh:path dcterms:subject ;\n\tsh:order 3 ;\n\tsh:maxCount 1 ;\n\tsh:class <http://www.dfg.de/dfg_profil/gremien/fachkollegien/faecher/> ;\n\tsh:name \"Fachrichtung\"@de, \"Subject Area\"@en .\n\n<https://purl.org/coscine/ap/radar#created> sh:path dcterms:created ;\n\tsh:order 2 ;\n\tsh:maxCount 1 ;\n\tsh:name \"Production Date\"@en, \"Erstelldatum\"@de ;\n\tsh:minCount 1 ;\n\tsh:datatype xsd:date ;\n\tsh:defaultValue \"{TODAY}\" .\n\n<https://purl.org/coscine/ap/radar#creator> sh:path dcterms:creator ;\n\tsh:order 0 ;\n\tsh:maxCount 1 ;\n\tsh:name \"Ersteller\"@de, \"Creator\"@en ;\n\tsh:minCount 1 ;\n\tsh:datatype xsd:string ;\n\tsh:defaultValue \"{ME}\" ;\n\tsh:minLength 1 .\n\n<https://purl.org/coscine/ap/radar#rights> sh:path dcterms:rights ;\n\tsh:order 5 ;\n\tsh:maxCount 1 ;\n\tsh:name \"Rights\"@en, \"Berechtigung\"@de ;\n\tsh:datatype xsd:string .\n\n<https://purl.org/coscine/ap/radar#rightsHolder> sh:path dcterms:rightsHolder ;\n\tsh:order 6 ;\n\tsh:maxCount 1 ;\n\tsh:name \"Rightsholder\"@en, \"Rechteinhaber\"@de ;\n\tsh:datatype xsd:string .\n\n<https://purl.org/coscine/ap/radar#title> sh:path dcterms:title ;\n\tsh:order 1 ;\n\tsh:maxCount 1 ;\n\tsh:name \"Titel\"@de, \"Title\"@en ;\n\tsh:minCount 1 ;\n\tsh:datatype xsd:string ;\n\tsh:minLength 1 .\n\n<https://purl.org/coscine/ap/radar#type> sh:path dcterms:type ;\n\tsh:order 4 ;\n\tsh:maxCount 1 ;\n\tsh:class <http://purl.org/dc/dcmitype/> ;\n\tsh:name \"Resource\"@en, \"Ressource\"@de .\n\n<https://purl.org/coscine/ap/radar/> dcterms:rights \"Copyright © 2020 IT Center, RWTH Aachen University\" ;\n\tdcterms:title \"radar application profile\"@en ;\n\tdcterms:license <http://spdx.org/licenses/MIT> ;\n\tsh:targetClass <https://purl.org/coscine/ap/radar/> ;\n\tsh:closed true ;\n\tsh:property <https://purl.org/coscine/ap/radar#subject>, <https://purl.org/coscine/ap/radar#created>, <https://purl.org/coscine/ap/radar#creator>, <https://purl.org/coscine/ap/radar#rights>, <https://purl.org/coscine/ap/radar#rightsHolder>, <https://purl.org/coscine/ap/radar#title>, <https://purl.org/coscine/ap/radar#type>, [\n\t\tsh:path rdf:type ;\n\t] ;\n\tdcterms:publisher <https://itc.rwth-aachen.de/> ;\n\ta sh:NodeShape .",
"type": "turtle"
},
{
"definition": "@prefix dcmitype: <http://purl.org/dc/dcmitype/> .\n@prefix dcterms: <http://purl.org/dc/terms/> .\n@prefix prov: <http://www.w3.org/ns/prov#> .\n@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n@prefix sh: <http://www.w3.org/ns/shacl#> .\n@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n@prefix aps: <https://purl.org/coscine/ap/> .\n\n<https://purl.org/coscine/ap/8e9c8f97-8bf3-46c1-aba6-27777791d01d/> dcterms:license <http://spdx.org/licenses/MIT> ;\n\tdcterms:publisher <https://itc.rwth-aachen.de/> ;\n\tdcterms:rights \"Copyright © 2020 IT Center, RWTH Aachen University\" ;\n\tdcterms:title \"radar application profile\"@en ;\n\ta sh:NodeShape ;\n\tprov:wasDerivedFrom <https://purl.org/coscine/ap/radar/> ;\n\tsh:closed true ;\n\tsh:property <https://purl.org/coscine/ap/radar#created>, <https://purl.org/coscine/ap/radar#rights>, <https://purl.org/coscine/ap/radar#rightsHolder>, <https://purl.org/coscine/ap/radar#subject>, <https://purl.org/coscine/ap/radar#title>, <https://purl.org/coscine/ap/radar#type>, _:b50_c14n0 ;\n\tsh:targetClass <https://purl.org/coscine/ap/radar/> .\n\n<https://purl.org/coscine/ap/radar#created> sh:datatype xsd:date ;\n\tsh:defaultValue \"{TODAY}\" ;\n\tsh:maxCount 1 ;\n\tsh:minCount 1 ;\n\tsh:name \"Erstelldatum\"@de, \"Production Date\"@en ;\n\tsh:order 2 ;\n\tsh:path dcterms:created .\n\n<https://purl.org/coscine/ap/radar#rights> sh:datatype xsd:string ;\n\tsh:maxCount 1 ;\n\tsh:name \"Berechtigung\"@de, \"Rights\"@en ;\n\tsh:order 5 ;\n\tsh:path dcterms:rights .\n\n<https://purl.org/coscine/ap/radar#rightsHolder> sh:datatype xsd:string ;\n\tsh:maxCount 1 ;\n\tsh:name \"Rechteinhaber\"@de, \"Rightsholder\"@en ;\n\tsh:order 6 ;\n\tsh:path dcterms:rightsHolder .\n\n<https://purl.org/coscine/ap/radar#subject> sh:maxCount 1 ;\n\tsh:name \"Fachrichtung\"@de, \"Subject Area\"@en ;\n\tsh:order 3 ;\n\tsh:path dcterms:subject ;\n\tsh:class <http://www.dfg.de/dfg_profil/gremien/fachkollegien/faecher/> .\n\n<https://purl.org/coscine/ap/radar#title> sh:datatype xsd:string ;\n\tsh:maxCount 1 ;\n\tsh:minCount 1 ;\n\tsh:name \"Titel\"@de, \"Title\"@en ;\n\tsh:order 1 ;\n\tsh:path dcterms:title ;\n\tsh:minLength 1 .\n\n<https://purl.org/coscine/ap/radar#type> sh:maxCount 1 ;\n\tsh:name \"Resource\"@en, \"Ressource\"@de ;\n\tsh:order 4 ;\n\tsh:path dcterms:type ;\n\tsh:class <http://purl.org/dc/dcmitype/> .\n\n",
"type": "turtle"
}
],
"methods": [
"FilterStructureSimplerJaccardSimilarity"
]
}
The results are:
[
{
"method": "FilterStructureSimplerJaccardSimilarity",
"returnObject": {
"similarity_matrix": {
"fileDistances": [],
"graphDistances": [
[
1,
0.880920060331825
],
[
0.880920060331825,
1
]
]
},
"time_spent": 0.012035846710205078
}
}
]
Docker
Run the following commands for using this repository with docker (replacing {yourPort}):
docker build -t semantic-similarity .
docker run --publish {yourPort}:36543 semantic-similarity