Skip to content
Snippets Groups Projects
Commit adbfea60 authored by sarahbensberg's avatar sarahbensberg
Browse files

Command line program with functions to use Elasticsearch as a search engine...

Command line program with functions to use Elasticsearch as a search engine for RDF-based metadata graphs in the context of CoScInE
parent 714468a3
Branches Sprint/2022-01 dev master
No related tags found
No related merge requests found
Showing
with 1886 additions and 2 deletions
MIT License
Copyright (c) 2020 RWTH Aachen University
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# Implementation
This repository contains the code created in the master thesis ["An Efficient Semantic Search Engine for Research Data in an RDF-based Knowledge Graph"](http://dx.doi.org/10.18154/RWTH-2020-09883) to implement a search engine for RDF-based metadata graphs based on Virtuoso, Elasticsearch and the developed mapping of the data.
Contains the implementation of the SemanticSearch research.
\ No newline at end of file
## Code documentation:
The code documentation can be found under *./docs/index.html*.
## Setup/Prerequisites:
- Virtuoso
- Virtuoso.ini file
- NumberOfBuffers >= 660000
- MaxDirtyBuffers >= 495000
- Define the following graphs as rulesets named "ruleset" for inference rules with `rdfs_rule_set ('<rulesetname>', '<graphname>') ;`
- http://www.dfg.de/dfg_profil/gremien/fachkollegien/faecher/
- http://www.w3.org/ns/org#
- http://xmlns.com/foaf/0.1/
- Data mentioned in ["Sample Dataset for Search Engine Evaluation for Research Data in an RDF-based Knowledge Graph"](http://dx.doi.org/10.18154/RWTH-2020-09886)
- All named_graphs, use virtuoso.db/graphs.db
- Elasticsearch
- Specified packages
## How to use:
1. (Re-)Build solution (Install/restore necessary packages)
2. Start Virtuoso
3. Start Elasticsearch
4. Run *SemanticSearchImplementation.exe* with the necessary arguments
### Commandline arguments
| Option | Description |
| ------------- |-------------|
|-a, --action | Required. Possible action: search, reindex, index, delete or add|
|-v | (Default: localhost) Server name of Virtuoso|
|--vp | (Default: 1111) Port of Virtuoso|
|--db | (Default: db) DB of Virtuoso|
|--dbUser | (Default: dba) Database user of Virtuoso|
|--dbPassword | (Default: dba) Database password of Virtuoso|
|-e | (Default: localhost) Server name of Elasticsearch|
|--ep | (Default: 9200) Port of Elasticsearch|
|-l | (Default: en) Specify language|
|-d, --doc | ID of metadata graph|
|-q, --query | (Default: *) Elasticsearch query|
|--adv | (Default: false) Set true for advanced Elasticsearch search syntax|
|-u, --user | (Default: ) Specify user or only public metadata records could be found|
\ No newline at end of file
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.8" />
</startup>
</configuration>
\ No newline at end of file
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
// General Information about an assembly is controlled through the following
// set of attributes. Change these attribute values to modify the information
// associated with an assembly.
[assembly: AssemblyTitle("SemanticSearchImplementation.Tests")]
[assembly: AssemblyDescription("")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("")]
[assembly: AssemblyProduct("SemanticSearchImplementation.Tests")]
[assembly: AssemblyCopyright("Copyright © 2020")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]
// Setting ComVisible to false makes the types in this assembly not visible
// to COM components. If you need to access a type in this assembly from
// COM, set the ComVisible attribute to true on that type.
[assembly: ComVisible(false)]
// The following GUID is for the ID of the typelib if this project is exposed to COM
[assembly: Guid("febfbe31-04d6-4acb-98a3-fecf3a03c81c")]
// Version information for an assembly consists of the following four values:
//
// Major Version
// Minor Version
// Build Number
// Revision
//
// You can specify all the values or you can default the Build and Revision Numbers
// by using the '*' as shown below:
// [assembly: AssemblyVersion("1.0.*")]
[assembly: AssemblyVersion("1.0.0.0")]
[assembly: AssemblyFileVersion("1.0.0.0")]
using NUnit.Framework;
namespace SemanticSearchImplementation.Tests
{
[TestFixture]
public class RdfClientTests
{
IRdfConnector _connector;
RdfClient _rdfClient;
[OneTimeSetUp]
public void Init()
{
_connector = new VirtuosoRdfConnector();
_rdfClient = new RdfClient(_connector, "en");
// TODO: automatic creation of the desired database state(containing defined named graphs)
// TODO: maybe use an additional test database using docker (https://hub.docker.com/r/tenforce/virtuoso/)
}
[OneTimeTearDown]
public void Cleanup()
{
// TODO: cleaning database to recreate the original state
}
[Test]
public void GetCurrentIndexTest()
{
var index = _rdfClient.GetCurrentIndexVersion();
Assert.AreEqual(1, index);
}
// TODO: add further tests and test classes
}
}
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Import Project="..\packages\NUnit3TestAdapter.4.0.0-alpha.1\build\net35\NUnit3TestAdapter.props" Condition="Exists('..\packages\NUnit3TestAdapter.4.0.0-alpha.1\build\net35\NUnit3TestAdapter.props')" />
<Import Project="..\packages\NUnit.3.12.0\build\NUnit.props" Condition="Exists('..\packages\NUnit.3.12.0\build\NUnit.props')" />
<Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" />
<PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
<Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
<ProjectGuid>{FEBFBE31-04D6-4ACB-98A3-FECF3A03C81C}</ProjectGuid>
<OutputType>Library</OutputType>
<RootNamespace>SemanticSearchImplementation.Tests</RootNamespace>
<AssemblyName>SemanticSearchImplementation.Tests</AssemblyName>
<TargetFrameworkVersion>v4.8</TargetFrameworkVersion>
<FileAlignment>512</FileAlignment>
<AutoGenerateBindingRedirects>true</AutoGenerateBindingRedirects>
<Deterministic>true</Deterministic>
<NuGetPackageImportStamp>
</NuGetPackageImportStamp>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
<PlatformTarget>AnyCPU</PlatformTarget>
<DebugSymbols>true</DebugSymbols>
<DebugType>full</DebugType>
<Optimize>false</Optimize>
<OutputPath>bin\Debug\</OutputPath>
<DefineConstants>DEBUG;TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
<PlatformTarget>AnyCPU</PlatformTarget>
<DebugType>pdbonly</DebugType>
<Optimize>true</Optimize>
<OutputPath>bin\Release\</OutputPath>
<DefineConstants>TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<PropertyGroup>
<StartupObject />
</PropertyGroup>
<ItemGroup>
<Reference Include="nunit.framework, Version=3.12.0.0, Culture=neutral, PublicKeyToken=2638cd05610744eb, processorArchitecture=MSIL">
<HintPath>..\packages\NUnit.3.12.0\lib\net45\nunit.framework.dll</HintPath>
</Reference>
<Reference Include="System" />
<Reference Include="System.Core" />
<Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" />
<Reference Include="Microsoft.CSharp" />
<Reference Include="System.Data" />
<Reference Include="System.Net.Http" />
<Reference Include="System.Xml" />
</ItemGroup>
<ItemGroup>
<Compile Include="RdfClientTests.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
</ItemGroup>
<ItemGroup>
<None Include="App.config" />
<None Include="packages.config" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\SemanticSearchImplementation\SemanticSearchImplementation.csproj">
<Project>{6f9d842f-8f75-4a10-bad7-44c21cf63f2f}</Project>
<Name>SemanticSearchImplementation</Name>
</ProjectReference>
</ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<Target Name="EnsureNuGetPackageBuildImports" BeforeTargets="PrepareForBuild">
<PropertyGroup>
<ErrorText>This project references NuGet package(s) that are missing on this computer. Use NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is {0}.</ErrorText>
</PropertyGroup>
<Error Condition="!Exists('..\packages\NUnit.3.12.0\build\NUnit.props')" Text="$([System.String]::Format('$(ErrorText)', '..\packages\NUnit.3.12.0\build\NUnit.props'))" />
<Error Condition="!Exists('..\packages\NUnit3TestAdapter.4.0.0-alpha.1\build\net35\NUnit3TestAdapter.props')" Text="$([System.String]::Format('$(ErrorText)', '..\packages\NUnit3TestAdapter.4.0.0-alpha.1\build\net35\NUnit3TestAdapter.props'))" />
</Target>
</Project>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="NUnit" version="3.12.0" targetFramework="net48" />
<package id="NUnit3TestAdapter" version="4.0.0-alpha.1" targetFramework="net48" />
</packages>
\ No newline at end of file

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio Version 16
VisualStudioVersion = 16.0.30128.74
MinimumVisualStudioVersion = 10.0.40219.1
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SemanticSearchImplementation", "SemanticSearchImplementation\SemanticSearchImplementation.csproj", "{6F9D842F-8F75-4A10-BAD7-44C21CF63F2F}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SemanticSearchImplementation.Tests", "SemanticSearchImplementation.Tests\SemanticSearchImplementation.Tests.csproj", "{FEBFBE31-04D6-4ACB-98A3-FECF3A03C81C}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{6F9D842F-8F75-4A10-BAD7-44C21CF63F2F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{6F9D842F-8F75-4A10-BAD7-44C21CF63F2F}.Debug|Any CPU.Build.0 = Debug|Any CPU
{6F9D842F-8F75-4A10-BAD7-44C21CF63F2F}.Release|Any CPU.ActiveCfg = Release|Any CPU
{6F9D842F-8F75-4A10-BAD7-44C21CF63F2F}.Release|Any CPU.Build.0 = Release|Any CPU
{FEBFBE31-04D6-4ACB-98A3-FECF3A03C81C}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{FEBFBE31-04D6-4ACB-98A3-FECF3A03C81C}.Debug|Any CPU.Build.0 = Debug|Any CPU
{FEBFBE31-04D6-4ACB-98A3-FECF3A03C81C}.Release|Any CPU.ActiveCfg = Release|Any CPU
{FEBFBE31-04D6-4ACB-98A3-FECF3A03C81C}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {89A3432E-59CD-40DB-824B-268C63140111}
EndGlobalSection
EndGlobal
using System;
using System.Collections.Generic;
using System.Linq;
using VDS.RDF;
using VDS.RDF.Query;
namespace SemanticSearchImplementation
{
/// <summary>
/// Represents an additional rule.
/// </summary>
public class AdditionalRule
{
private readonly IRdfConnector _connector;
private SparqlParameterizedString _queryString;
public AdditionalRule(IRdfConnector connector, SparqlParameterizedString queryString)
{
_connector = connector;
_queryString = queryString;
}
/// <summary>
/// Executes an additional rule for a specific metadata graph.
/// </summary>
/// <remarks>Additional rules can affect the maping of other existing metadata graphs.</remarks>
/// <param name="graphName">ID of metadata graph.</param>
/// <returns>A dictionary containing the IDs of the metadata graphs (key) and the corresponding constructed triples (value).</returns>
public IDictionary<string, List<Triple>> ExecuteRule(string graphName)
{
_queryString.SetUri(RdfClient.LABEL_ADDITIONAL_RULE, new Uri(graphName));
var constructedTriples = _connector.QueryWithResultGraph(_queryString);
// group by subject to distinguish different metadata records
var groupedTriples = constructedTriples.Triples.GroupBy(x => x.Subject.ToString());
return groupedTriples.ToDictionary(x => x.Key, x => x.ToList());
}
}
}
\ No newline at end of file
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.8" />
</startup>
</configuration>
\ No newline at end of file
using System.Collections.Generic;
using VDS.RDF;
namespace SemanticSearchImplementation
{
/// <summary>
/// Contains methods for an application profile.
/// </summary>
public abstract class ApplicationProfile
{
protected readonly RdfClient _RDFClient;
protected readonly IDictionary<string, LiteralRule> _generalLiteralRules;
protected readonly IEnumerable<AdditionalRule> _generalAdditionalRules;
public ApplicationProfile(RdfClient RDFClient)
{
_RDFClient = RDFClient;
_generalLiteralRules = _RDFClient.ConstructLiteralRules(Uris.COSCINE_LITERAL_RULES);
_generalAdditionalRules = _RDFClient.ConstructAdditionalRules(Uris.COSCINE_ADDITIONAL_RULES);
}
/// <summary>
/// Executes general additional rules for a given metadata graph.
/// </summary>
/// <param name="graphName">ID of metadata graph.</param>
/// <returns>An enumerator of general additional triples.</returns>
public IEnumerable<Triple> GetGeneralAdditionalTriples(string graphName)
{
List<Triple> triples = new List<Triple>();
foreach (var generalAdditionalRule in _generalAdditionalRules)
{
var constructedTriplesGeneral = generalAdditionalRule.ExecuteRule(graphName);
triples.AddRange(constructedTriplesGeneral[graphName]);
;
}
return triples;
}
}
}
\ No newline at end of file
using Newtonsoft.Json.Linq;
using System;
using System.Globalization;
using VDS.RDF;
using VDS.RDF.Parsing;
namespace SemanticSearchImplementation
{
/// <summary>
/// Contains methods to parse the objects of a metadata graph according to the mapping type.
/// </summary>
public class DataTypeParser
{
private RdfClient _RDFClient;
public DataTypeParser(RdfClient RDFClient)
{
_RDFClient = RDFClient;
}
/// <summary>
/// Parses the node given the mapping and profile.
/// </summary>
/// <param name="label">The label of the field.</param>
/// <param name="node">The node which needs to be converted into a (list of) literal.</param>
/// <param name="indexMapper">The <c>ElasticsearchIndexMapper</c>.</param>
/// <param name="profile">The specific applicationprofile the node belongs to.</param>
/// <returns>A JSON object containing the label (key) and the corresponding literal/list of literals (value).</returns>
public JObject Parse(string label, INode node, ElasticsearchIndexMapper indexMapper, SpecificApplicationProfile profile)
{
if (node.NodeType == NodeType.Literal)
{
// in the application profile a data type was specified
ILiteralNode literalNode = (ILiteralNode)node;
var type = indexMapper.GetTypeOfProperty(label);
// type of properties of additional triples do not appear in profile
// check mapping if field already exists or generate type from type of object node
if (String.IsNullOrEmpty(type))
{
var literalType = XmlSpecsHelper.XmlSchemaDataTypeString;
if (literalNode.DataType != null)
{
literalType = literalNode.DataType.ToString();
}
type = indexMapper.GetSearchType(_RDFClient.GetDataType(literalType));
}
return ParseLiteralNode(label, type, literalNode);
}
else if (node.NodeType == NodeType.Uri)
{
// in the application profile a class was specified
return profile.GetLiterals(label, node.ToString());
}
else
{
throw new NotIndexableException("Object has unknown type.");
}
}
/// <summary>
/// Parses literal nodes depending on the needed Elasticsearch type specified in the mapping for a label.
/// </summary>
/// <param name="label">The label of the property which is used as field.</param>
/// <param name="type">The type specified in the Elasticsearch mapping for the label.</param>
/// <param name="literalNode">The literal node which needs to be parsed.</param>
/// <returns>A JSON object containing the label (key) and the corresponding literal (value).</returns>
private JObject ParseLiteralNode(string label, string type, ILiteralNode literalNode)
{
switch (type)
{
case ElasticsearchIndexMapper.TEXT:
return ParseString(label, literalNode);
case ElasticsearchIndexMapper.KEYWORD:
return ParseString(label, literalNode);
case ElasticsearchIndexMapper.BOOLEAN:
return ParseBoolean(label, literalNode);
case ElasticsearchIndexMapper.INTEGER:
return ParseInt(label, literalNode);
case ElasticsearchIndexMapper.DATE:
return ParseDate(label, literalNode);
default:
throw new NotIndexableException("Unknown property type");
}
}
/// <summary>
/// Parses a literal node into a boolean and adds second field for a written representation of the label and boolean value.
/// </summary>
/// <param name="label">The label of the property which is used as field.</param>
/// <param name="literalNode">The literal node which needs to be parsed.</param>
/// <returns>A JSON object containing the label (key) and the corresponding boolean (value) as well as a written variant.</returns>
private JObject ParseBoolean(string label, ILiteralNode literalNode)
{
var val = literalNode.Value;
bool boolValue = false;
if (String.Equals(val, "true") || String.Equals(val, "1"))
{
boolValue = true;
}
return new JObject()
{
{ label, boolValue },
{ label + ElasticsearchIndexMapper.BOOLEAN_EXTENSION, label.Replace("_", " ") + " " + boolValue.ToString().ToLower() }
};
}
/// <summary>
/// Parses a literal node into a date and adds fields for day, month and year.
/// </summary>
/// <param name="label">The label of the property which is used as field.</param>
/// <param name="literalNode">The literal node which needs to be parsed.</param>
/// <returns>A JSON object containing the label (key) and the corresponding date (value) as well as pairs for day, year and month.</returns>
public JObject ParseDate(string label, ILiteralNode literalNode)
{
var dateTime = Convert.ToDateTime(literalNode.Value);
return new JObject() {
{ label, literalNode.Value },
// additional date fields
{ label + ElasticsearchIndexMapper.YEAR_EXTENSION, dateTime.Year },
{ label + ElasticsearchIndexMapper.MONTH_EXTENSION, dateTime.ToString("MMMM", new CultureInfo(_RDFClient.GetLanguage())) },
// TODO: remove dayExtension field
{ label + ElasticsearchIndexMapper.DAY_EXTENSION, dateTime.Day }
};
}
/// <summary>
/// Parses a literal node into a string.
/// </summary>
/// <param name="label">The label of the property which is used as field.</param>
/// <param name="literalNode">The literal node which needs to be parsed.</param>
/// <returns>A JSON object containing the label (key) and the corresponding string (value).</returns>
public JObject ParseString(string label, ILiteralNode literalNode)
{
return new JObject() {
{ label, literalNode.Value }
};
}
/// <summary>
/// Parses a literal node into an integer.
/// </summary>
/// <param name="label">The label of the property which is used as field.</param>
/// <param name="literalNode">The literal node which needs to be parsed.</param>
/// <returns>A JSON object containing the label (key) and the corresponding integer (value).</returns>
public JObject ParseInt(string label, ILiteralNode literalNode)
{
return new JObject()
{
{ label, Convert.ToInt32(literalNode.Value) }
};
}
}
}
\ No newline at end of file
using Newtonsoft.Json.Linq;
using System;
using System.Collections.Generic;
using System.Linq;
using static SemanticSearchImplementation.RdfClient;
namespace SemanticSearchImplementation
{
/// <summary>
/// Contains methods to create, change and handle the current Elasticsearch index.
/// </summary>
public class ElasticsearchIndexMapper
{
// ES date types
public const string TEXT = "text";
public const string KEYWORD = "keyword";
public const string BOOLEAN = "boolean";
public const string INTEGER = "integer";
public const string DATE = "date";
// special fields
public const string YEAR_EXTENSION = "_year";
public const string MONTH_EXTENSION = "_month";
public const string DAY_EXTENSION = "_day";
public const string BOOLEAN_EXTENSION = "_written";
// administrative additional triples
public const string LABEL_APPLICATION_PROFILE = "applicationProfile";
public const string LABEL_BELONGS_TO_PROJECT = "belongsToProject";
public const string LABEL_GRAPHNAME = "graphName";
public const string LABEL_IS_PUBLIC = "isPublic";
//index name
public const string DEFAULT_ALIAS_NAME = "final_index";
private readonly RdfClient _RDFClient;
private readonly IEnumerable<string> _properties;
private readonly IDictionary<string, string> _labelOfProperties;
private readonly IDictionary<string, string> _generalAdditionalTripleLabels = new Dictionary<string, string>()
{
{ Uris.COSCINE_SEARCH_APPLICATION_PROFILE, LABEL_APPLICATION_PROFILE },
{ Uris.COSCINE_SEARCH_BELONGS_TO_PROJECT, LABEL_BELONGS_TO_PROJECT },
{ Uris.COSCINE_SEARCH_GRAPHNAME, LABEL_GRAPHNAME },
{ Uris.COSCINE_PROJECTSTRUCTURE_IS_PUBLIC_LONG, LABEL_IS_PUBLIC }
};
private IDictionary<string, string> _typeOfProperties;
public ElasticsearchIndexMapper(RdfClient RDFClient, IDictionary<string, string> mapping = null)
{
_RDFClient = RDFClient;
_properties = _RDFClient.GetProperties();
_labelOfProperties = CreateLabelOfProps(_properties);
if (mapping == null)
{
_typeOfProperties = CreateTypeOfProps(_properties);
} else
{
_typeOfProperties = mapping;
}
}
/// <summary>
/// Replaces the current mapping.
/// </summary>
/// <remarks>At first the mappig contains the uniformly generated data types based on the application profiles,
/// later the types of Elasticsearch.</remarks>
/// <param name="mapping"></param>
public void ReplaceMapping(IDictionary<string, string> mapping)
{
_typeOfProperties = mapping;
}
/// <summary>
/// Creates the JSON object which contains all information (settings, mappings and aliases) for a new index.
/// </summary>
/// <param name="alias">The name of the alias if it should be set directly at the beginning (only in case of initial indexing).</param>
/// <returns>A JSON object for the request to create a new index.</returns>
public JObject CreateIndex(string alias = null)
{
var jObjectKeyword = new JObject() {
new JProperty("type", KEYWORD),
};
dynamic jObjectText = new JObject();
jObjectText.type = TEXT;
jObjectText.fields = new JObject()
{
{KEYWORD, jObjectKeyword }
};
dynamic jObjectGraphName = new JObject();
jObjectGraphName.type = KEYWORD;
jObjectGraphName.doc_values = true;
var jObjectProperties = new JObject() {
{ LABEL_GRAPHNAME, jObjectGraphName},
{ LABEL_BELONGS_TO_PROJECT, jObjectKeyword},
{ LABEL_APPLICATION_PROFILE, jObjectText},
{ LABEL_IS_PUBLIC, new JObject(){{ "type", BOOLEAN }}}
};
foreach (var property in _properties)
{
if (String.Equals(property, Uris.RDF_TYPE_LONG))
{
continue;
}
var label = GetLabelOfProperty(property);
if (String.IsNullOrEmpty(label))
{
Console.WriteLine($"Property {property} could not be indexed because no label was found.");
continue;
}
var type = GetTypeOfProperty(property);
if (String.Equals(type, TEXT))
{
jObjectProperties.Add(label, jObjectText);
}
else
{
jObjectProperties.Add(label, new JObject() { { "type", type } });
}
// additional fields for types
if (String.Equals(type, DATE))
{
jObjectProperties.Add(label + MONTH_EXTENSION, jObjectText);
jObjectProperties.Add(label + YEAR_EXTENSION, new JObject() { { "type", INTEGER } });
jObjectProperties.Add(label + DAY_EXTENSION, new JObject() { { "type", INTEGER } });
}
else if (String.Equals(type, BOOLEAN))
{
jObjectProperties.Add(label + BOOLEAN_EXTENSION, jObjectText);
}
}
dynamic finalJObject = new JObject();
finalJObject.settings = new JObject() as dynamic;
finalJObject.settings.index = new JObject() as dynamic;
finalJObject.settings.index.blocks = new JObject() as dynamic;
finalJObject.settings.index.blocks.read_only_allow_delete = false;
finalJObject.settings.analysis = new JObject() as dynamic;
finalJObject.settings.analysis.analyzer = new JObject() as dynamic;
finalJObject.settings.analysis.analyzer.@default = new JObject() as dynamic;
finalJObject.settings.analysis.analyzer.@default.type = "english";
finalJObject.mappings = new JObject() as dynamic;
finalJObject.mappings.properties = jObjectProperties;
if (alias != null)
{
finalJObject.Add(new JProperty("aliases", new JObject
{
new JProperty(alias, new JObject())
})
);
}
return finalJObject;
}
public string GetLabelOfProperty(string property)
{
if (!_labelOfProperties.ContainsKey(property))
{
// label of specific additional triples
_labelOfProperties[property] = CreateLabelOfProperty(property);
}
return _labelOfProperties[property];
}
public string GetTypeOfProperty(string property)
{
if (_typeOfProperties.ContainsKey(property))
{
return _typeOfProperties[property];
}
else
{
return null;
}
}
/// <summary>
/// Creates the labels (field names) for all properties.
/// </summary>
/// <param name="properties">An enumerator containing all properties (metadata fields) as URIs.</param>
/// <returns>A dictionary containing properties (key) and corresponding labels (value).</returns>
private IDictionary<string, string> CreateLabelOfProps(IEnumerable<string> properties)
{
IDictionary<string, string> result = new Dictionary<string, string>();
foreach (var prop in properties)
{
result.Add(prop, CreateLabelOfProperty(prop));
}
// add fixed labels (of general additional properties)
foreach (var prop in _generalAdditionalTripleLabels)
{
result.Add(prop.Key, prop.Value);
}
return result;
}
/// <summary>
/// Creates the label of a property.
/// </summary>
/// <param name="property">The property as URI.</param>
/// <returns>The created label.</returns>
private string CreateLabelOfProperty(string property)
{
// check for rdfs:label
var label = _RDFClient.GetRdfsLabel(property);
if (String.IsNullOrEmpty(label))
{
// try to guess the label
label = _RDFClient.GuessLabel(property);
if (String.IsNullOrEmpty(label))
{
// use the first shacl name description in profiles
var profileNames = _RDFClient.GetApplicationProfilesNamesOfProperty(property);
if (profileNames.Count() != 0)
{
label = profileNames.First();
}
else
{
Console.WriteLine($"No label was found for {property}");
return null;
}
}
}
return label.ToLower().Replace(" ", "_");
}
/// <summary>
/// Creates the type for all properties.
/// </summary>
/// <param name="properties">An enumerator containing all properties (metadata fields) as URIs.</param>
/// <returns>A dictionary containing properties (key) and corresponding type (value).</returns>
private IDictionary<string, string> CreateTypeOfProps(IEnumerable<string> properties)
{
IDictionary<string, string> result = new Dictionary<string, string>();
foreach (var prop in properties)
{
result.Add(prop, GetSearchType(_RDFClient.GetTypeOfProperty(prop)));
}
return result;
}
/// <summary>
/// Maps application profile type to corresponding Elasticsearch type.
/// </summary>
/// <param name="type">The application profile type.</param>
/// <returns>An Elasticsearch type.</returns>
public string GetSearchType(ApplicationProfileType type)
{
switch (type)
{
case ApplicationProfileType.BOOLEAN:
return BOOLEAN;
case ApplicationProfileType.INTEGER:
return INTEGER;
case ApplicationProfileType.DATE:
return DATE;
case ApplicationProfileType.STRING:
return TEXT;
case ApplicationProfileType.CLASS:
return TEXT;
default:
return TEXT;
}
}
}
}
\ No newline at end of file
using Newtonsoft.Json.Linq;
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace SemanticSearchImplementation
{
/// <summary>
/// Implements necessary functions to use Elasticsearch as a search engine.
/// </summary>
/// <inheritdoc cref="ISearchClient"/>
public class ElasticsearchSearchClient : ISearchClient
{
// API
private const string ALIASES = "_aliases";
private const string SEARCH = "_search";
private const string MAPPING = "_mapping";
private const string BULK = "_bulk";
private static readonly HttpClient client = new HttpClient();
private readonly string baseUrl;
private string _index;
public ElasticsearchSearchClient(string server = "localhost", string port = "9200")
{
baseUrl = $"http://{server}:{port}/";
client.DefaultRequestHeaders.Add("Accept", "application/json");
}
public void ChangeIndex(string index)
{
_index = index;
}
public async Task<IDictionary<string, string>> GetMappingAsync()
{
var response = await client.GetAsync(baseUrl + _index + "/" + MAPPING);
JObject jObject = JObject.Parse(await response.Content.ReadAsStringAsync());
var fieldMappings = (JObject)jObject[_index]["mappings"]["properties"];
IDictionary<string, string> mapping = new Dictionary<string, string>();
foreach (var prop in fieldMappings)
{
mapping.Add(prop.Key, (string) prop.Value["type"]);
}
return mapping;
}
/// <summary>
/// Queries the document ID for a metadata graph.
/// </summary>
/// <param name="graphName">ID of the metadata graph.</param>
/// <returns>ID of the Elasticsearch document.</returns>
private async Task<string> GetIdFromGraphNameAsync(string graphName)
{
var content = new JObject() {
new JProperty("query", new JObject
{
new JProperty("term", new JObject
{
new JProperty(ElasticsearchIndexMapper.LABEL_GRAPHNAME, new JObject
{
new JProperty("value", graphName)
})
})
})
};
var response = await client.PostAsync(baseUrl + _index + "/" + SEARCH, CreateJsonContent(content));
JObject jObject = JObject.Parse(await response.Content.ReadAsStringAsync());
if ((int)jObject["hits"]["total"]["value"] == 1)
{
return (string)jObject["hits"]["hits"][0]["_id"];
}
else
{
return null;
}
}
/// <summary>
/// Handles response of a HTTP request.
/// </summary>
/// <param name="response">The response of a HTTP request.</param>
private void HandleResponse(HttpResponseMessage response)
{
if (!response.IsSuccessStatusCode)
{
// TODO
Console.WriteLine(response.StatusCode);
Console.WriteLine(response.Content);
Console.WriteLine(response.RequestMessage);
}
}
// create json content from JObject
private HttpContent CreateJsonContent(JObject content)
{
return new StringContent(content.ToString(), Encoding.UTF8, "application/json");
}
// create json content from string
private HttpContent CreateJsonContent(string content)
{
return new StringContent(content, Encoding.UTF8, "application/json");
}
public async Task CreateIndexAsync(JObject content, string index)
{
var response = await client.PutAsync(baseUrl + index, CreateJsonContent(content));
response.EnsureSuccessStatusCode();
}
public async Task AddDocumentsAsync(IEnumerable<JObject> documents)
{
var contentList = new List<string>();
foreach (var document in documents)
{
// create new document
contentList.Add($"{{ \"index\" : {{ }} }}");
contentList.Add(document.ToString().Replace("\n", "").Replace("\r", ""));
}
var content = String.Join("\n", contentList) + "\n";
var response = await client.PostAsync(baseUrl + _index + "/" + BULK, CreateJsonContent(content));
HandleResponse(response);
}
public async Task SwitchAliasAsync(string from, string to)
{
var jObject = new JObject {
new JProperty("actions", new JArray
{
new JObject
{
new JProperty("remove", new JObject
{
new JProperty("alias", ElasticsearchIndexMapper.DEFAULT_ALIAS_NAME),
new JProperty("index", from)
})
},
new JObject
{
new JProperty("add", new JObject
{
new JProperty("alias", ElasticsearchIndexMapper.DEFAULT_ALIAS_NAME),
new JProperty("index", to)
})
}
})
};
var response = await client.PostAsync(baseUrl + ALIASES, CreateJsonContent(jObject));
response.EnsureSuccessStatusCode();
}
public async Task DeleteIndexAsync(string index)
{
var response = await client.DeleteAsync(baseUrl + index);
HandleResponse(response);
}
public async Task AddDocumentAsync(string graphName, IDictionary<string, JObject> documents)
{
var contentList = new List<string>();
foreach (var document in documents)
{
if (String.Equals(document.Key, graphName))
{
var id = await GetIdFromGraphNameAsync(document.Key);
// use id if this document already exists
var useId = String.IsNullOrEmpty(id) ? "" : $"\"_id\":\"{id}\"";
// create new document
contentList.Add($"{{ \"index\" : {{ {useId} }} }}");
contentList.Add(document.Value.ToString().Replace("\n", "").Replace("\r", ""));
}
else
{
contentList = await AddOtherDocuments(contentList, document);
}
}
await BulkRequestAsync(contentList);
}
/// <summary>
/// Creates the content list for a bulk request with the given documents.
/// </summary>
/// <param name="contentList">A list containing the content for the bulk request.</param>
/// <param name="document">IDs of metadata graphs (key) and their correcsponding content as JSON object (value).</param>
/// <returns>The task result contains the created content list for the bulk request.</returns>
private async Task<List<string>> AddOtherDocuments(List<string> contentList, KeyValuePair<string, JObject> document)
{
var id = await GetIdFromGraphNameAsync(document.Key);
// it could take some time while new document is indexed and ready for search
while (String.IsNullOrEmpty(id))
{
Console.WriteLine($"Wait for elasticsearch _id of graph {document.Key}");
Thread.Sleep(100);
id = await GetIdFromGraphNameAsync(document.Key);
}
// update existing document
contentList.Add($"{{ \"update\" : {{ \"_id\" : \"{id}\" }} }}");
contentList.Add($"{{ \"doc\": {document.Value.ToString()} }}".Replace("\n", "").Replace("\r", ""));
return contentList;
}
public async Task DeleteDocumentAsync(string graphName, IDictionary<string, JObject> documents)
{
var contentList = new List<string>();
contentList.Add($"{{ \"delete\" : {{ \"_id\":\"{await GetIdFromGraphNameAsync(graphName)}\" }} }}");
foreach (var document in documents)
{
contentList = await AddOtherDocuments(contentList, document);
}
await BulkRequestAsync(contentList);
}
/// <summary>
/// Executes a bulk request with the given list of content.
/// </summary>
/// <param name="contentList">An enumerator of content rows for the bulk request.</param>
/// <returns>A task that represents the asynchronous save operation.</returns>
private async Task BulkRequestAsync(IEnumerable<string> contentList)
{
var content = String.Join("\n", contentList) + "\n";
// bulk request because constructed additional triples could change other documents..
var response = await client.PostAsync(baseUrl + _index + "/" + BULK, CreateJsonContent(content));
HandleResponse(response);
}
// search
public async Task<IDictionary<string, double>> SearchAsync(string query, IEnumerable<string> projects, bool advanced, int size, int from, string sorting)
{
// track_total_hits = true around query to get total value of results
var searchType = "simple_query_string";
if (advanced)
{
searchType = "query_string";
}
JObject queryJObject = new JObject()
{
new JProperty("bool", new JObject
{
new JProperty("must", new JObject
{
new JProperty(searchType, new JObject
{
new JProperty("query", query)
})
}),
new JProperty("filter", new JObject {
CreateVisibilityFilter(projects)
})
})
};
JObject content = new JObject()
{
{"size", size},
{"from", from},
{"sort", JArray.Parse(sorting)},
{"_source", ElasticsearchIndexMapper.LABEL_GRAPHNAME},
{"query", queryJObject}
};
return await Search(content);
}
/// <summary>
/// Creates the visibility filter for search to allow a user only to see public metadata or metadata of own projects.
/// </summary>
/// <param name="projects">An enumerator containing the projects which are allowed.</param>
/// <returns>A JProperty containing the specified visibility filter.</returns>
private JProperty CreateVisibilityFilter(IEnumerable<string> projects)
{
return new JProperty("bool", new JObject {
new JProperty("must", new JObject
{
new JProperty("bool", new JObject
{
new JProperty("should", new JArray
{
new JObject
{
new JProperty("terms", new JObject
{
new JProperty(ElasticsearchIndexMapper.LABEL_BELONGS_TO_PROJECT, new JArray(projects))
})
},
new JObject
{
new JProperty("term", new JObject
{
new JProperty(ElasticsearchIndexMapper.LABEL_IS_PUBLIC, true)
})
}
})
})
})
});
}
/// <summary>
/// Runs the search and handles the response.
/// </summary>
/// <param name="content">A JSON object containing the body of the search request.</param>
/// <returns>The task result contains a dictionary containing the IDs of the found metadata graphs (key)
/// and the corresponding ranking (value).</returns>
private async Task<IDictionary<string, double>> Search(JObject content)
{
var response = await client.PostAsync(baseUrl + ElasticsearchIndexMapper.DEFAULT_ALIAS_NAME + "/" + SEARCH, CreateJsonContent(content));
if (response.IsSuccessStatusCode)
{
JObject jObject = JObject.Parse(await response.Content.ReadAsStringAsync());
return GetSearchResults(jObject);
}
else
{
// TODO: for malformed query
// ["error"]["root_cause"]["reason"]
// ["error"]["root_cause"]["type"]
HandleResponse(response);
return new Dictionary<string, double>();
}
}
/// <summary>
/// Filters the plain search results of a search request.
/// </summary>
/// <param name="results">Plain JSON result of a search request.</param>
/// <returns>A dictionary containing the ID of the metadata graphs (key) and the corresponding rankings (value),</returns>
private IDictionary<string, double> GetSearchResults(JObject results)
{
IDictionary<string, double> graphNames = new Dictionary<string, double>();
if ((int)results["hits"]["total"]["value"] > 0)
{
foreach (var result in results["hits"]["hits"])
{
graphNames.Add((string)result["_source"][ElasticsearchIndexMapper.LABEL_GRAPHNAME], (double)result["_score"]);
}
}
return graphNames;
}
}
}
\ No newline at end of file
using VDS.RDF;
using VDS.RDF.Query;
namespace SemanticSearchImplementation
{
/// <summary>
/// Provides all necessary functions for querying and manipulating the RDF-based knowledge graph.
/// </summary>
public interface IRdfConnector {
/// <summary>
/// Returns a metadata graph.
/// </summary>
/// <param name="graphName">ID of the metadata graph.</param>
/// <returns>A graph.</returns>
Graph GetGraph(string graphName);
/// <summary>
/// Queries the knowledge graph with a parametrized string and a <c>Graph</c> as result.
/// </summary>
/// <param name="query">Parametrized SPARQL query which returns a <c>Graph</c>.</param>
/// <returns>A <c>Graph</c>.</returns>
IGraph QueryWithResultGraph(SparqlParameterizedString query);
/// <summary>
/// Queries the knowledge graph with a query and possibly with stored inference rules.
/// </summary>
/// <param name="query">String representation of a SPARQL query.</param>
/// <param name="withInference">Flag which indicates the use of the inference rules.</param>
/// <returns>A <c>SparqlResultSet</c>.</returns>
SparqlResultSet QueryWithResultSet(string query, bool withInference = true);
/// <summary>
/// Queries the knowledge graph with a parametrized string and a <c>SparqlResultSet</c> as result.
/// </summary>
/// <param name="query">Parametrized SPARQL query which returns a <c>SparqlResultSet</c>.</param>
/// <returns>A <c>SparqlResultSet</c>.</returns>
SparqlResultSet QueryWithResultSet(SparqlParameterizedString query);
/// <summary>
/// Updates the knoeldge graph with a parametrized string.
/// </summary>
/// <param name="query">Parametrized SPARQL query.</param>
void Update(SparqlParameterizedString query);
}
}
\ No newline at end of file
using Newtonsoft.Json.Linq;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace SemanticSearchImplementation
{
/// <summary>
/// Provides all necessary functions to implement a document-based search engine.
/// </summary>
public interface ISearchClient
{
/// <summary>
/// Queries the mappings of the current index.
/// </summary>
/// <returns>A dictionary containing the fields and corresponding types.</returns>
Task<IDictionary<string, string>> GetMappingAsync();
/// <summary>
/// Creates an index with the given settings and mappings.
/// </summary>
/// <param name="content">JSON object containing the settings and mappings.</param>
/// <param name="index">The index name.</param>
/// <returns>A task that represents the asynchronous save operation.</returns>
Task CreateIndexAsync(JObject content, string index);
/// <summary>
/// Adds the given documents as a bulk upload.
/// </summary>
/// <param name="documents">An enumerator of all documents as JSON object.</param>
/// <returns>A task that represents the asynchronous save operation.</returns>
Task AddDocumentsAsync(IEnumerable<JObject> documents);
/// <summary>
/// Changes the alias from the old to the new index.
/// </summary>
/// <param name="from">Name of old index.</param>
/// <param name="to">Name of new index.</param>
/// <returns></returns>
Task SwitchAliasAsync(string from, string to);
/// <summary>
/// Deletes the given index.
/// </summary>
/// <param name="index">Name of the index.</param>
/// <returns>A task that represents the asynchronous delete operation.</returns>
Task DeleteIndexAsync(string index);
/// <summary>
/// Adds/updates a new document and possibly changes existing documents.
/// </summary>
/// <remarks>Additional rules can influence the mapping of existing metadata graphs.</remarks>
/// <param name="graphName">ID of the metadata graph to be added/updated.</param>
/// <param name="documents">A dictionary containing the ID of metadata graphs (key)
/// and corresponding JSON objects (value).</param>
/// <returns>A task that represents the asynchronous save operation.</returns>
Task AddDocumentAsync(string graphName, IDictionary<string, JObject> documents);
/// <summary>
/// Deletes a document and possibly changes other existing documents.
/// </summary>
/// <remarks>Additional rules can influence the mapping of existing metadata graphs.</remarks>
/// <param name="graphName">ID of the metadata graph to be deleted.</param>
/// <param name="documents">A dictionary containing the ID of metadata graphs (key)
/// and corresponding JSON objects (value).</param>
/// <returns>A task that represents the asynchronous delete operation.</returns>
Task DeleteDocumentAsync(string graphName, IDictionary<string, JObject> documents);
/// <summary>
/// Updates the current index.
/// </summary>
/// <param name="index">Name of new current index.</param>
void ChangeIndex(string index);
/// <summary>
/// Searches the index using the alias.
/// </summary>
/// <param name="query">The search query of the user.</param>
/// <param name="projects">List of allowed projects (of a user).</param>
/// <param name="advanced">Flag to specify simple or advanced search syntax.</param>
/// <param name="size">Number of results.</param>
/// <param name="from">Position from which the results should be returned.</param>
/// <param name="sorting">Sorting of the results <see>
/// (see https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html). </see></param>
/// <returns>The task result contains a dictionary containing the IDs of the found metadata graphs (key)
/// and the corresponding ranking (value).</returns>
Task<IDictionary<string, double>> SearchAsync(string query, IEnumerable<string> projects, bool advanced , int size, int from, string sorting);
}
}
\ No newline at end of file
using System;
using System.Collections.Generic;
using System.Linq;
using VDS.RDF;
using VDS.RDF.Query;
namespace SemanticSearchImplementation
{
/// <summary>
/// Represents a literal rule.
/// </summary>
public class LiteralRule
{
private readonly IRdfConnector _connector;
private readonly SparqlParameterizedString _queryString;
private readonly string _language;
public LiteralRule(IRdfConnector connector, SparqlParameterizedString queryString, string language)
{
_connector = connector;
_queryString = queryString;
_language = language;
}
/// <summary>
/// Executes a literal rule for a given instance of a class.
/// </summary>
/// <param name="instance">A URI of a concrete instance of a class.</param>
/// <returns>A list of constructed literals.</returns>
public IEnumerable<string> ExecuteRule(string instance)
{
_queryString.SetUri(RdfClient.LABEL_LITERAL_RULE, new Uri(instance));
var result = _connector.QueryWithResultGraph(_queryString);
return result.Triples.Where(x =>
{
var language = ((ILiteralNode)x.Object).Language;
return String.Equals(language, _language) || String.Equals(language, String.Empty);
}).Select(x => ((ILiteralNode)x.Object).Value).Distinct();
}
}
}
\ No newline at end of file
using System;
namespace SemanticSearchImplementation
{
/// <summary>
/// Exception which is thrown if a triple of a metadata graph could not be indexed.
/// </summary>
[Serializable]
public class NotIndexableException : Exception
{
private readonly string _reason;
public NotIndexableException(string reason) : base() => _reason = reason;
public string Reason => _reason;
}
}
\ No newline at end of file
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
// General Information about an assembly is controlled through the following
// set of attributes. Change these attribute values to modify the information
// associated with an assembly.
[assembly: AssemblyTitle("SemanticSearchImplementation")]
[assembly: AssemblyDescription("")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("")]
[assembly: AssemblyProduct("SemanticSearchImplementation")]
[assembly: AssemblyCopyright("Copyright © 2020")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]
// Setting ComVisible to false makes the types in this assembly not visible
// to COM components. If you need to access a type in this assembly from
// COM, set the ComVisible attribute to true on that type.
[assembly: ComVisible(false)]
// The following GUID is for the ID of the typelib if this project is exposed to COM
[assembly: Guid("6f9d842f-8f75-4a10-bad7-44c21cf63f2f")]
// Version information for an assembly consists of the following four values:
//
// Major Version
// Minor Version
// Build Number
// Revision
//
// You can specify all the values or you can default the Build and Revision Numbers
// by using the '*' as shown below:
// [assembly: AssemblyVersion("1.0.*")]
[assembly: AssemblyVersion("1.0.0.0")]
[assembly: AssemblyFileVersion("1.0.0.0")]
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment