User Guide
Example project first steps
Setup
Let’s start with a short example of how to use Pykraken (PYKKN). First, you need to implement the following lines of code. This is mandatory for every PYKKN project you are dealing with:
from pykkn.dataset import Dataset
from pykkn.dataset_image import Dataset_Image
from pykkn.dataset_video import Dataset_Video
from pykkn.instrument import Instrument
from pykkn.model import Model
from pykkn.parameter import Parameter
from pykkn.pipeline import Pipeline
from pykkn.run import Run
In this case, we also import Numpy as a widely used library in Python:
import numpy as np
Add first data
Now, we can add the first data to our HDF5 file.
We start with a dataset. Below, you can see how to implement it.
In the first line, we create the dataset1 which is called “msmt01”.
This is done by using the constructor (add link to description) of the dataset class.
After that, we add the actual data. For this purpose, create a random set of numbers and then add them to the data paramter of dataset1.
With the third line, we add an attribute to the dataset. Here, we specify the samplerate we used while recording the data.
The name of the parameter is determined in the brackets and the value is added with the equal sign.
In the last line, another attribute is added. In this case, the attribute is a timestamp to document when the data was collected:
dataset1 = Dataset('msmt01')
dataset1.data = np.random.rand(1, 10 ** 4)
dataset1.attrs['samplerate'] = 1000
dataset1.attrs['timestamp'] = '2017-05-14 18:44:11'
Modelling a sensor characteristic
Let’s assume the data we collected was measured with a preasure sensor. The sensor creates a voltage signal as an output.
The voltage signal needs to be converted to a pressure value according to a characteristic curve.
To this end, we will implement the characteristic curve as a parameter, add this parameter to a model and create an instrument from this model.
Step by step: let’s start with the parameter.
In the following lines of code, we see how to implement the parameter object.
First, we create the object parameter1 and name it “gain” by writing so in the brackets.
Afterwards, we add some attributes to this parameter:
parameter1 = Parameter('gain')
parameter1.attrs['value'] = 1
parameter1.attrs['units'] = '-'
parameter1.attrs['variable'] = '-'
parameter1.attrs['origin'] = 'this'
A more efficient way to add the paramters can be achieved by using the function build_multi_paramters. First, you need to include the function as follows:
from pykkn.build_multi_parameters import build_multi_parameters
Then you can create a dictionary with all the parameters:
dic = {
"list_of_parameters": [
{"name": "gain", "value": 1, "units": '-', "variables": '-', "orign": 'this'}
]
}
This dictionary can be adding more parameters by just adding another line in curly brackets.
The builrd_multi_parameters uses the before created dictionary:
paramter1 = build_multi_paramters(dic)
You can choose one of the two shown ways to create parameters.
The next step is to create model to which we can add the parameters. In this case, we create the model1 which is called “feedthrough”. Afterwards we add the parameters to this model:
model1 = Model('feedthrough')
model1.add([parameter1])
Now we can create an instrument. Here we create the instrument1 and name it “transmitter”. After creating this instrument, we add the model1 to this instrument:
instrument1 = Instrument('transmitter')
instrument1.add([model1])
Structuring of the data
In the next step, we add our dataset to a pipeline object. Normally we add several datasets here to generate an array.
At the same time, we store the metadata of the instruments used in a test rig in the pipeline module.
The main purpose of the pipeline class is to struchture and organize the data.
In the lines below, the object pipeline1 is created. In brackets, you can determine
the location where it is stored in the HDF5 file. Afterwards, we add the dataset1 to the pipeline1.
The last three lines contain metadata of the used instrument:
pipeline1 = Pipeline('measured/capa1/raw')
pipeline1.add([dataset1])
pipeline1.attrs['variable'] = 'voltage'
pipeline1.attrs['units'] = 'volts'
pipeline1.attrs['origin'] = 'this'
Now we create an object from the run module to further structure our data. With the run object, we will add the pipelines and the parameters to each other. This shown in the lines below. First, we create the object named “run1”. Then we add some metatadata for our test series:
msmtrun = Run('run1')
msmtrun.attrs['author'] = 'derGeraet'
msmtrun.attrs['pmanager'] = 'tcorneli'
msmtrun.attrs['targettmp'] = np.double(70)
msmtrun.attrs['targetrps'] = np.double(2)
msmtrun.attrs['oil'] = 'PA06'
Now, we can add the pipeline object and the parameter object to the run object:
msmtrun.add([pipeline1])
msmtrun.add([parameter1])
Create and store HDF5 file
Before we can export our HDF5 file, we need to specify the location where it should be saved and the name of the file. Pay attention that you use the slash symbol “/” instead of backslash “\” to determine the location and pay attention that you choose a name for your file plus the ending “.h5”:
msmtrun.set_storage_path("C:/Users/Example/PYKKN/example_for_docs.h5")
The last step is to export the HDF5 file by using the store function with the run object. This is how we can ensure to export the whole file:
msmtrun.store()
Now, let’s have a look at the created HDF5 file. Below, you can see a picture of the created file. The structure is fully expanded and an open-source HDF5 viewer is used to show it. On the left side, you can see the structe of the file we created:

Example project import of a .csv object
In the following project, we will show you how to store data from a csv table in an HDF5 file using PYKKN.
This might be usefull if, for example, your data is stored in an Excel sheet. With Excel, you can then export the file as a csv file and store it into HDF5.
As an example, look at the following table:
time [s] |
sensor 1 |
sensor 2 |
sensor 3 |
---|---|---|---|
0.01 |
6.040 |
4.350 |
1.694 |
0.05 |
0.903 |
2.261 |
1.930 |
0.10 |
7.621 |
0.922 |
0.023 |
0.15 |
1.922 |
4.974 |
0.317 |
0.20 |
8.510 |
0.796 |
0.545 |
0.25 |
4.515 |
4.384 |
1.640 |
0.30 |
4.163 |
1.112 |
1.828 |
0.35 |
6.909 |
4.202 |
1.622 |
0.40 |
1.679 |
0.713 |
0.381 |
0.45 |
2.809 |
0.068 |
0.612 |
0.50 |
5.402 |
3.884 |
1.152 |
0.55 |
4.481 |
0.834 |
1.898 |
0.60 |
1.429 |
0.854 |
0.176 |
0.65 |
7.038 |
4.113 |
1.584 |
0.70 |
2.110 |
4.087 |
0.262 |
0.75 |
0.920 |
0.167 |
0.442 |
0.80 |
4.918 |
2.122 |
0.476 |
0.85 |
8.989 |
3.269 |
1.093 |
0.90 |
0.883 |
4.461 |
1.848 |
0.95 |
2.475 |
0.164 |
0.422 |
1.00 |
1.653 |
3.396 |
0.550 |
Setup
Let’s start by setting up the project as usual, though in this short case we would not need all the extensions:
from instrument import Instrument
from model import Model
from parameter import Parameter
from dataset import Dataset
from pipeline import Pipeline
from run import Run
import numpy as np
In this case, it is important to add the panda framework as well because we will use this to store the CSV file:
import panda as pd
Inserting and storing of the data
At first, we will store the CSV file as a variable which we can then add to dataset object. To do this, we use the function read_csv() from the panda framework:
df = pd.read_csv("test_data.csv")
Afterwards, we create a dataset object and add the df variable to this object:
dataset1 = Dataset("test_data.csv")
dataset1.data = df.values
To create the HDF5 file from this, we use the store() function:
dataset1.store()
The output of this short project can be seen below:

Recurring functionalities and attributes
In PYKKN, you will find several functions that are avalaible in different modules. To achieve this, the library is based on object-oriented programming
and several modules inherit functions from the storable class. The functions of this class will be explained below in more detail to avoid explaining again for each module.
For further information about the storable class, you can visit the API reference. (add link to API refrence)
To give you a brief overview of the classtructure of PYKKN you can have a look at the UML diagram below (feel free to click on the graphic to zoom in):
The following modules inherit from the storage class (add links to the api reference):
dataset
dataset_image
dataset_keyvalue
dataset_video
instrument
model
parameter
pipeline
run
All the objects that are created from these classes can use the following functions:
set_storage_path(path: str)
Every object that is created from theses classes can export HDF5 files. Before doing so, the storage path needs to be defined. Therefore, you add the storage path to the function as a parameter. The storage path needs to be in quotation marks (“”) and end on the name of the HDF5 file plus the ending “.h5”. Here, you can see an example:
data1.set_storage_path("C:/Users/Example/data1.h5")
Be aware to use the slash symbol “/” instead of backslash “\”.
store()
To create the HDF5 file from any object, use the store function. An example is shown below:
data1.store()
show()
This function will show you the content of an object, for example the attributes:
data1.show()
add_attrs_dict(Dict)
With this function, you can add a flat dictionary of key values to an object as a set of attributes. The key represents the name of the attribute and the value represents the value assigned to the attribute:
data1.add_attrs_dict(myDictionary)
name attribute
Next to the functions, an object from the classes named before inherits two different attributes from the storage class: The first attribute is the name. Normally, this attribute is attached to the object as a parameter when calling the constructor of the object. An example can be seen below:
data1.name = preasureData
Do not mix up this name of the object. The name of the object in this case is “data1” and mandatory to call the object in Python. The name preasureData is one that will be shown in the HDF5 file.
attrs attribute
The attrs attribute represents metadata that can be attached to every object. The type of metadata will be further specified in the different modules below:
data1.attrs['samplerate'] = 1000
Dataset modules
The main porpose behind PYKKN is to store data in a structured way. While doing this we do not want to lose any kind of data or metadata. Therefore, PYKKN provides the opportunity to store several different types of data. In the following lines, the modules to store data will be explained in more detail.
Dataset module
With the dataset module, you are able to store an array of numbers.
As PYKKN is made for scientific research, the dataset modle is typically used when
storing measured data from a test rig. Apart from the pure data, it is possible to
store metadata such as the samplerate and the timestamp.
To create a dataset object, the constructor is needed. The parameter of the constructor determines the name of the dataset. This is also the name shown in the HDF5 file in the end. You can see an example here:
dataset1 = Dataset('nameOfDataset')
The name of the object in this example is “dataset1”. The name of the object is important
for further coding in Python, for example in the next part.
To add data to the dataset object, we access the data array of the object which was created when the object itself was created. We can overwrite this array with our data:
dataset1.data = measuredData
To add the samplerate and the timestamp as metadata, we access the attributes of the dataset1 object. At this point, it is mandatory to specify which attribute we want to access. This is done by typing “samplerate” or “timestamp” in the brackets when accessing the attributes. The samplerate is represented as a number in 1/s. The timestamp should have the form YYYY-MM-DD HH:mm:ss.
dataset1.attrs['samplerate'] = 1000
dataset1.attrs['timestamp'] = '2022-06-12 10:39:11'
If you want to create an HDF5 file from this dataset, it possible to do so. In PYKKN it is possible to create an HDF5 file from every single object that is created. Therefore, we first set up the storage path by writing it in the brackets of the set_storage_path() function. Keep in mind to use the slash symbol “/” in the path instead of backslash “\” and add the name plus “.h5” in the end of the path:
dataset1.set_storage_path('test/test_ut_ds.h5')
Now, to create the HDF5 file we call the store() function:
dataset1.store()
Dataset image
With this module, the user is enabled to store an image in the HDF5 file. If you are dealing with a video, please use dataset_video module instead of trying to store it as an array of images with this module. There are two steps to implement an image. First, you create an object of the datasat_image class and give it a name when using the constructor. In this case the name is “image_dataset_1. Afterwards, you add the path of the image where it is stored so PYKKN can collect the data:
datasetImage1 = Dataset_Image('image_dataset_1')
datasetImage1.data = "/test/test_rig_1.jpg"
As further metadata, you can add, for example, the timestamp to the image object:
dataseImage1['timestemp'] = '2022-06-12 10:45:21'
This command is comparable to the one of the datasat module.
Dataset video
If you want to store a video in your HDF5 file, you can use this module. The usage is very similar to the image module. First, create an object that represents the video you want to store. Then, add the storage path. If you want, you can also add the timestamp as metadata:
datasetVideo1 = Dataset_Video('video_dataset_1')
datasetVideo1.data = "C:/Users/Administrator/Videos/Captures/test_meeting _recording.mp4"
datasetVideo1.attrs['timestamp'] = '2022-06-13 11:22:11'