Changes

Jonas Grieb · 41a3b44b
--- a/home.md
+++ b/home.md
 # KH Populator Documentation

-[1. Introduction & goals](introduction-and-goals)
\ No newline at end of file
+[1. Introduction & goals](#1-introduction-and-goals)
+[2. Constraints](#2-constraints)
+[3. Context and Scope](#3-context-and-scope)
+[4. Solution Strategy](#4-solution-strategy)
+
+
+# 1. Introduction and goals
+
+The KH populator package represents a unified solution to harvest remote metadata and import it into the Knowledge Hub (KH). It shall consist out of a set of harvesting pipelines that are developed by different members of the NFDI4Earth developers group.
+
+## 1.1 Requirements overview
+- The output of the pipelines conforms to the NFDI4Earth (meta) [data model](https://drive.google.com/file/d/1cWuEtz7kqKZ5nKfYYjgEV8M0TZuBnJzU/view?usp=share_link) (link will change in the future)
+- Pipelines can be scheduled to run with a job scheduler
+- Pipelines can be run independently from each other
+- Pipelines can be run repeatedly without creating duplicate data of any kind in the KH
+- Pipelines don't overwrite data in the KH that has been added by other pipelines or users
+
+## 1.2 Quality goals
+The top four quality goals for this software project are
+- Functional suitability
+- Reliability
+- Maintainability
+- Compatibility
+
+## 1.3 Stakeholders
+- NFDI4Earth developers
+- (to a lesser degree): partners who collect metadata for the KH or provide a system from where metadata is harvested into the KH
+
+
+# 2. Constraints
+**Constraint** | **Explanation**
+-------------- | ---------------
+RDF output | output of the harvesting must always be in RDF so it can be added to the KH
+Python | general programming language for the pipelines is Python (preferred language by 4Earth developers)
+
+
+# 3. Context and Scope
+The KH populator harvests metadata of different information resource types as defined by the KH (meta) data model (TODO: link), e.g. about repositories, research organizations, datasets, ...
+
+The data model of the KH populator must therefore be exactly aligned to the KH data model and the output must pass validation against the KH data model.
+
+The input are the different systems the get harvested. We refer to them as **source systems**. The systems usually provide an open, well-documented API via the Internet for the harvesting. These can be SPARQL endpoints, specific REST APIs or interfaces following standardized protocols for (meta)data exchange e.g. OAI-PMH, OGC WCS/WMS, ...
+
+Additionally, KH populator must provide the option to import metadata from TSV tables of manually collected metadata. The goal is to move to a completely web-based harvesting approach, but at least for the piloting phase of the KH manually collected data plays an important role.
+
+
+# 4. Solution strategy
+A Python package is being developed which provides CLI commands to trigger individual harvesting pipelines. The package is divided into subpackages to represent the logical structure:
+* `kh_populator` is the main package which contains the code which triggers harvesting pipelines
+* `kh_populator_domain` contains modules with domain specific function for the harvesting and transformation of external (meta)data sources - these functions should be called from the respective pipeline
+* `kh_populator_logic` contains useful functions which might be required in different domain modules
+* `kh_populator_model` contains classes which reflect the data model of the Knowledge Hub. For each individual information resource type that is being collected, a Python class must exist in `kh_populator_model` where the specific properties for this resource type are defined as Python instance variable, and the (de)serialization from the Python class to RDF is defined
\ No newline at end of file