Package coscine
Coscine Python SDK
The Coscine Python SDK is an open source python package providing a pythonic interface to the Coscine REST API. It is compatible with Python versions 3.7+.
Please note that this python module is developed and maintained by the scientific community and even though Copyright remains with RWTH Aachen, it is not an official service that RWTH Aachen provides support for.
Installation
Installing python
This is platform dependent, so you need to figure out how to install python by yourself, if it is not already installed. This will get you started:
In the following snippets, depending on your installation and platform,
you may have to substitute python
with py
or python3
.
Installing the Coscine Python SDK
Using pip
This module is hosted on the Python Package Index (PyPi). You can install and update it and all of its dependencies via the Python Package Manager (pip):
python -m pip install --upgrade coscine
Using conda
This module's pypi version is mirrored in conda forge. You can install and update it and all of its dependencies via Conda:
conda install -c conda-forge coscine
Using git
You can install the python module with a local copy of the source code, which you can grab via git:
git clone https://git.rwth-aachen.de/coscine/community-features/coscine-python-sdk.git
cd ./coscine-python-sdk
py setup.py
Creating an API token
You need an API token to use the Coscine API. If you have not already, create a new API token. Once you have an API token you are ready to use the API.
A word of advice
The token represents sensible data and grants anyone in possesion of it full access to your data in coscine. Do not leak it publicly on online platforms such as github! Do not include it within your sourcecode if you intend on uploading that to the internet. Take precautions and follow best practices to avoid corruption, theft or loss of data!
Using the API token
There are two simple and safe methods of using the API token without exposing it to unintended audiences.
Storing the token in a file and loading it when needed
Simply put your API token in a file on your harddrive and read the file when initializing the Coscine client. This has the advantage of keeping the token out of the sourcecode and offering the user an easy way to switch between tokens by changing the filename.
import os
fd = open("token.txt", "rt")
token = fd.read()
fd.close()
However it comes at the disadvantage of potentially exposing the token by accidentially leaking the file together with the sourcecode. Therefore precautions must be taken i.e. when using git as a versioning system. A .gitignore file including any possible token name or file extension should be mandatory. You could for example exclude the filename token.txt. A better way would be to agree upon a common token file extension such as .token and exclude that file extension. Then you can safely push your code to online platforms such as GitLab or GitHub.
Storing the token in an environment variable
This method does not rely on any files but instead on environment variables. Simply set an environment variable containing your token and use that variable from your python program.
import os
# Set environment variable
os.environ["COSCINE_API_TOKEN"] = "My Token Value"
# Get environment variable
token = os.getenv("COSCINE_API_TOKEN")
This is certainly a little more complex for some users who may want to use your program. They can easily share tokens by sending a file to colleagues but sharing environment variables requires each user to additionally create the environment variable on their local PC.
Find out how to temporarily or permanently set environment variables on certain Operating Systems:
Import and Initialization
Before you can use this python library you obviously have to import it in your sourcecode.
import coscine
Initializing the Coscine API client is done by calling the CoscineClient constructor:
# Signature
coscine.Client(token: str, lang: str = "en", logger: coscine.Logger = None, persistent_cache: bool = True)
The constructor takes one mandatory argument - the Coscine API token. A minimal usage example would thus look like this:
client = coscine.Client(token)
# With german language option:
client = coscine.Client(token, "de")
The token variable should contain a string with your Coscine API token. You can set this variable by following one of the steps described in Creating an API token.
To find out more about the coscine.Client
class take a look at
the documentation.
Logging and cli output
The python SDK provides a handful of logging functionality to inspect data
sent to/from Coscine and get updated on up-/download progress.
You can enable/disable and configure it in the coscine.Client()
constructor
by providing an instance of the coscine.Logger
class to it.
Creating a logger:
logger = coscine.Logger(True) # True <-> enable logging, False for disabling it
client = coscine.Client(token, logger=logger)
# You can redirect logger output to a file
fd = open("logfile.txt", "w")
logger = coscine.Logger(True, stdout=fd)
# And you can specify different levels of output
logger = coscine.Logger(True, loglevels = [coscine.LogLevel.WARN])
# or enable/disable colored output, if working with a cli
logger = coscine.Logger(True, colors=True)
Logging can be enabled/disabled at any time using the logger class:
logger.enabled = False
Working with Coscine Projects
Coscine is a project-based data management platform. A Project represents the central hub for all the data resources and the people involved in a scientific undertaking.
List projects
Getting a list of projects is easy:
projects = client.projects() # Optional filtering possible e.g. client.projects(Author = "Jane")
for project in projects:
print(project.name)
If we know which project we require beforehand, we can just query it by its name. We now get a single object of type Project, instead of a list:
project = client.project("My Project") # We could also filter by different keys such as Author: client.project(Author="Joe")
print(project)
Here we can see some more project metadata. When printing a project object, its metadata is output nicely formatted inside of a table.
The keys and values are human readable and easy to understand. Under the hood its not that simple and sometimes we might need to access the Coscine internal identifier for a certain field. In that case we can use the data dictionary, which grants us access to the metadata as seen by the Coscine server. Printing it yields a JSON representation of our projects metadata:
print(project.data)
Delete a project
Once again a plain and simple function call. Be aware though, that this function may fail due to unsufficient privileges. The call may yield an error, which we should catch and handle:
try:
project.delete()
except coscine.AuthorizationError:
print("We are not authorized.")
Create a project
The python module provides a simplified way of programmatically creating a project. All data formatting and error detection is performed under the hood.
form = client.ProjectForm()
# or: form = coscine.project.ProjectForm(client)
print(form)
form = client.ProjectForm()
form["Projektname"] = "test"
form["Anzeigename"] = "test"
form["Projektbeschreibung"] = "test"
form["Principal Investigators"] = "test"
form["Projektstart"] = "test"
form["Projektende"] = "test"
form["Disziplin"] = ["Informatik 409"]
form["Teilnehmende Organisation"] = ["RWTH Aachen University"]
form["Sichtbarkeit"] = "Public"
client.create_project(form)
Looking at the use of a dictionary, one might ask why we do not just use a function with named arguments for each dictionary field. Multiple benefits arise from the use of a dictionary: - The dictionary keys are familiar from the Coscine web interface and change based on the python client language preset - The fields inside of the dictionary can be iterated over and their properties like controlled vocabularies can be queried
The 2nd benefit ultimately enables easy inclusion in GUI applications.
project.download(path="./")
Member management
members = project.members()
for member in members:
print(member.name)
print(member.email)
# Set project role for a member
member.set_role("Owner")
# Delete a member
member.remove()
Inviting new members:
EMAIL_ADDRESS: str = "john@example.com"
ROLE: str = "Member"
project.invite(EMAIL_ADDRESS, ROLE)
Working with Coscine Resources
Resources store all of your data and metadata. As such they represent a key data structure, which you most certainly will interact a lot with.
Getting a list of resources
Analoguous to getting a list of projects, you can get a list of resoures. The only difference being, that the resources()
method is part of the project object and does only query resources contained within that project. Just like for projects, you can specify a filter to filter by certain resource properties.
resources = project.resources()
for resource in resources:
print(resource.name)
If we know which resource we require beforehand, we can just query it by its name. We now get a single object of type Resource, instead of a list:
resource = project.resource("RessourcenNameS3")
print(resource)
Deleting a resource
Again, this does not work with our public token - you do not have privileges to delete our sample resource. Try deleting a resource you have created, but be careful not to delete anything of value.
try:
resource.delete()
except coscine.UnauthorizedError:
print("Not authorized.")
Downloading a resource
Downloading a resource and all of the data contained within the resource is just as simple as downloading a project. In fact internally project.download()
just calls Resource.download()
for all resources contained within the project.
resource.download(path="./")
Resource Quota
You can fetch the used up quota of a resource as an integer indicating the size used in Bytes.
quota = resource.quota()
print(quota)
Resource Application profile
An application profile specifies a template for metadata. There may be times where you need to interact with that profile. To get the application profile of a resource you simple call the application_profile()
method. You can either get the raw application profile in JSON-LD format or a more readable (and easier to interact with) parsed version, by setting the parse
argument to True
.
# Print a raw and a parsed application profile
profile = resource.applicationProfile()
print(profile)
print("-------------------------------------------------------------")
profile = resource.applicationProfile(parse=True)
print(profile)
Creating a resource
Once again we are using InputForms to set metadata of a Coscine object.
form = project.ResourceForm()
# or: form = coscine.resource.ResourceForm(project)
form = project.ResourceForm()
form["Resource Type"] = "rds"
form["Resource Size"] = "31"
form["Resource Name"] = "My Cool Resource"
form["Display Name"] = "Cool"
form["Resource Description"] = "Testing Coscine Client Resources"
form["Discipline"] = ["Computer Science 409"]
form["Application Profiles"] = "RADAR"
form["Visibility"] = "Project Members"
project.create_resource(form)
Getting S3 credentials
RDS-S3 resources can be directly accessed via an S3-client. Direct connections require S3 credentials, which s3 resource instances happily provide to us. Only works for s3 resources.
# Keys with write privileges, substitute for key_read for read-only privileges
access_key: str = resource.s3.access_key_write
secret_key: str = resource.s3.secret_key_write
endpoint: str = resource.s3.endpoint
bucket: str = resource.s3.bucket
Working with Objects and Metadata
Files are stored inside of resources. However Resources do not necessarily contain files. It depends on the resource type. RDS and RDS-S3 contain files, but Linked Data resources contain references to files. Therefore we cannot just talk about files, but have to use a more abstract term such as 'object'. Objects represent files and file-like instances in Coscine. Nonetheless the methods of interacting with files and file-like objects is always the same - just don't expect file contents for objects of linked data resources, as those merely contain links or whatever has been specified as their content.
Getting a list of files
files = resource.objects()
for file in files:
print(file.name)
file = resource.object("Messung (1).bin")
print(file.name)
data = file.content()
Downloading a file
file = resource.object("Messung (1).bin")
path = "./"
file.download(path)
Uploading a file
metadata = resource.metadata()
metadata["Title"] = "..."
# fill in fields
filename = "messung.bin" # filename as it should appear in Coscine
path = "./data/messung.bin" # path on harddrive
resource.upload(filename, path, metadata)
Deleting a file
file = resource.object("Messung (1).bin")
file.delete()
# The file object is still valid until garbage collected.
# The file on the Coscine server has already been removed though.
Working with file metadata
We can interact with metadata using a MetadataForm.
form = file.form()
print(form)
The form fields change depending on the selected application profile for the resource.
form["Title"] = "My Title"
# Update metadata
file.update(form)
Exceptions
The python module defines a bunch of custom exceptions.
class CoscineException(Exception):
"""
Coscine base Exception class.
"""
pass
###############################################################################
class ConnectionError(CoscineException):
"""
In case the client is not able to establish a connection with
the Coscine servers, a ConnectionError is raised.
"""
pass
###############################################################################
class ClientError(CoscineException):
"""
An error has been made or detected on the client side.
"""
pass
###############################################################################
class ServerError(CoscineException):
"""
An error has been made or detected on the Coscine server.
"""
pass
###############################################################################
class VocabularyError(CoscineException):
"""
Raised in InputForms when a supplied value is not contained within
a controlled vocabulary.
"""
pass
###############################################################################
class RequirementError(CoscineException):
"""
Commonly raised in InputForms when a required field has not been set.
"""
pass
###############################################################################
class AuthorizationError(CoscineException):
"""
AuthorizationErrors are thrown when the owner of the Coscine API
token does not hold enough privileges.
"""
pass
###############################################################################
class AmbiguityError(CoscineException):
"""
An AmbiguityError is raised in cases where two objects could
not be differentiated between.
"""
pass
###############################################################################
class ParameterError(CoscineException):
"""
Invalid (number of) function parameters provided. In some cases
the user has the option of choosing between several optional arguments,
but has to provide at least one.
"""
pass
###############################################################################
Coscine Python SDK Examples
Inviting a list of project members
Sometimes you need to invite a lot of people to a project. If you already have a list of their E-Mail addresses, you can easily iterate over the list and invite every single one of them.
from typing import List
import coscine
TOKEN: str = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
PROJECT_NAME: str = "My Project"
EMAILS: List[str] = [
"adam@example.com",
"eva@example.com"
]
client = coscine.Client(TOKEN)
project = client.project(PROJECT_NAME)
for email in EMAILS:
project.invite(email)
Uploading a file to a resource
# Import the package
import coscine
# We read our token from a file called 'token.txt'.
# Note: We could store our token directly inside the sourcecode
# as a string, but this is not recommended!
fd = open("token.txt", "rt")
token = fd.read()
fd.close()
# We create an instance of the Coscine client using
# the CoscineClient constructor.
client = coscine.Client(token)
project = client.project("My Project")
resource = project.resource("My Resource")
# We could create metadata in json-ld format by hand and use it
# to upload a file or assign metadata to a file. However this
# proves to be cumbersome and prone to ambiguous errors.
# Therefore we'd like to use some form or template, which the
# coscine python package is able to create for us!
# The template is created by requesting the application profile
# and the controlled vocabulary used within project and resource
# and examining their fields as well as the constraints put
# upon those fields.
# After that a custom dictionary-like datatype is created and filled
# with default and fixed values (as specified during resource creation).
# We can interact with this dictionary just like with any other dictionary
# with the difference being, that we can only set fields specified in
# the application profile and - in case of fields controlled
# by a vocabulary - can only use values of that vocabulary.
# Furthermore the printing functionality has been altered, to convey
# more information about the fields stored within the template.
# Just try print(form) and figure it out for yourself.
form = resource.metadata()
# We can now modify the template.
# Let's assume our resource is using the ENGMETA metadata scheme, and set
# the required values for our metadata:
form["Title"] = "Hello World!"
form["Creator"] = "John Doe"
form["Contact"] = "John Doe"
form["Creation Date"] = "2021-01-21"
form["Publication Date"] = "2021-01-21"
form["Embargo End Date"] = "2023-01-01"
form["Version"] = "1.0"
form["Mode"] = "Experiment"
form["Step"] = "42"
form["Type"] = "Image"
form["Subject Area"] = "Medicine"
# As you can see in the example above, every value in the template
# is a string. This is keeping things simple - you just need to
# remember to cast any datetime object to a string in yyyy-mm-dd
# format and any integer to a string.
# Do note that the fields "Mode", "Type" and "Subject Area" are
# controlled by a vocabulary. You can just set the value you'd
# select in the web interface of Coscine and let the custom dictionary
# automatically resolve the actual value.
# "Actual value?" you are asking? Well - when we set "Type" to "Image"
# we are not actually assigning the string "Image" to it, but rather
# a uniform resource identifier containing the image type. This is
# intransparent to the user, but allows us to work with "Images" and
# "Waveforms" rather than "http://long-url.xd/id=Images", etc.
# Note that this also applies to controlled fields containing a fixed
# or default value - they will not contain "Images" but rather a
# uri resolving to "Images".
# Let's print our template and inspect how it looks.
# Make sure we filled in all required fields, so that no errors
# are thrown around when we try to upload.
print(form)
# We can now directly use this template as our metadata or generate
# a json-ld representation of our metadata using:
metadata = form.generate()
# Note that the generate() function will validate that each required
# field is present and that all fields are correctly formatted before
# generating the metadata. Thus be prepared to catch some exceptions.
# We do not need to generate the metadata before using functions like
# upload() though, as this is done internally.
# The upload_file() function needs 3 arguments:
# resource: Where do you want to store your file?
# filename: Which filename/path should the file assume within the resource?
# path: The local path to the file
# metadata: Either a filled in metadata template or json-ld formatted
# metadata string. If you specify a template it is validated before
# uploading. Thus, if you have not used template.generate() before
# you should now be prepared to catch some exceptions in case the
# metadata is containing bad values.
filename = "My Research Data.csv"
resource.upload(filename, filename, metadata)
# We can reset our template to its default values and re-use it for
# other files
form.clear()
Modifying/Setting metadata of files inside an (S3-)resource
Files stored in an s3-resource do not require you to specify metadata on upload. However it is considered good practice to tag each file with metadata anyway. Here is an easy way of doing just that. The example assumes you've got a few files present inside an s3-resource, all of which have little to no metadata yet.
import coscine
TOKEN: str = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
PROJECT_NAME: str = "My Project"
RESOURCE_NAME: str = "My Resource"
client = coscine.Client(TOKEN)
project = client.project(PROJECT_NAME)
resource = project.resource(RESOURCE_NAME)
# This loop would set the same metadata for each file.
# Obviously this doesn't make much sense in the real world.
for file in resource.objects():
form = file.form()
form["Title"] = "My Title"
form["Author"] = "Mrs. X"
file.update(form)
Establishing an S3-connection
Using the S3 library of your choice it's quite easy to connect to an S3 resource. The following short snippet shows how to use amazons boto3 SDK to connect to a Coscine S3 resource.
import coscine
import boto3
TOKEN: str = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
PROJECT_NAME: str = "My Project"
RESOURCE_NAME: str = "My S3 Resource"
FILE_NAME: str = "MyFile.ext"
client = coscine.Client(TOKEN)
project = client.project(PROJECT_NAME)
resource = project.resource(RESOURCE_NAME)
s3 = boto3.resource("s3", aws_access_key_id = resource.s3.write_access_key,\
aws_secret_access_key = resource.s3.write_secret_key,\
endpoint_url = resource.s3.endpoint)
bucket = s3.Bucket(resource.s3.bucket)
filesize = bucket.Object(FILE_NAME).content_length
bucket.download_file(FILE_NAME, "./") # path "./" (current directory)
Asynchronous requests
The Coscine Python SDK uses the requests module, which does only offer synchronous http requests. However with the help of some additional modules, we can easily make any function of the Coscine SDK asynchronous and thus might be able to speed certain things up. With the concurrent.futures module we can take advantage of a ThreadPool, offloading tasks to multiple threads:
import coscine
from concurrent.futures import ThreadPoolExecutor, wait
PROJECT_NAME: str = "My Project"
RESOURCE_NAME: str = "My S3 Resource"
client = coscine.Client()
project = client.project(PROJECT_NAME)
resource = project.resource(RESOURCE_NAME)
# Asynchronously download files in a resource
# (Note: This has already been incorporated into the SDK, no need
# to do that by hand. However other functions may benefit from the
# same strategy too...)
with ThreadPoolExecutor(max_workers=4) as executor:
files = resource.objects()
futures = [executor.submit(file.download, path) for file in files]
wait(futures)
for index, fut in enumerate(futures):
try: fut.result()
except CoscineException:
self.client.logger.warn(f"Error downloading '{files[index].name}'.")
GUI integration
The Coscine Python SDK has been implicitly written with GUIs in mind. Building a GUI around it is very easy, assuming you have some understanding of your GUI library of choice. The Qt Framework is well established in commercial, industrial and scientific use. We'll be using its python binding PyQt5. If you are just starting out, we recommend ditching TKinter and similar libraries in favor of Qt or WxWidgets.
import coscine
from PyQt5.QtCore import *
from PyQt5.QtGui import *
from PyQt5.QtWidgets import *
TOKEN: str = "XXXXXXXXXXXXXXXXXXX"
PROJECT_NAME: str = "My Project"
RESOURCE_NAME: str = "My Resource"
FILE_NAME: str = "My file.jpeg"
client = coscine.Client(TOKEN)
project = client.project(PROJECT_NAME)
resource = project.resource(RESOURCE_NAME)
file = resource.object(FILE_NAME)
metadata = file.form()
# GUI visualization
# (Obviously you should put that in a class)
app = QApplication([])
dialog = QDialog()
dialog.setWindowTitle("Metadata Editor")
buttonBox = QDialogButtonBox(QDialogButtonBox.Ok | QDialogButtonBox.Cancel)
#buttonBox.accepted.connect(self.accept)
#buttonBox.rejected.connect(self.reject)
# The magic happens here
group = QGroupBox("Metadata")
layout = QFormLayout()
for key in metadata.keys():
if metadata.is_controlled(key):
widget = QComboBox()
widget.setEditable(True)
widget.setInsertPolicy(QComboBox.NoInsert)
widget.completer().setCompletionMode(QCompleter.PopupCompletion)
for entry in metadata.get_vocabulary(key):
widget.addItem(entry)
else:
widget = QLineEdit(metadata[key])
layout.addRow(QLabel(key), widget)
group.setLayout(layout)
scrollArea = QScrollArea()
scrollArea.setWidget(group)
scrollArea.setHorizontalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAlwaysOff)
mainLayout = QVBoxLayout()
mainLayout.addWidget(scrollArea)
mainLayout.addWidget(buttonBox)
dialog.setLayout(mainLayout)
dialog.exec_()
Expand source code
###############################################################################
# Coscine Python SDK
# Copyright (c) 2018-2022 RWTH Aachen University
# Licensed under the terms of the MIT License
# #############################################################################
# Coscine, short for Collaborative Scientific Integration Environment is
# a platform for research data management (RDM).
# For more information on Coscine visit https://www.coscine.de/.
#
# Please note that this python module is open source software primarily
# developed and maintained by the scientific community. It is not
# an official service that RWTH Aachen provides support for.
###############################################################################
###############################################################################
# File description
###############################################################################
"""
## Coscine Python SDK
The Coscine Python SDK is an open source python package providing
a pythonic interface to the Coscine REST API. It is compatible
with Python versions 3.7+.
Please note that this python module is developed and maintained
by the scientific community and even though Copyright remains with
RWTH Aachen, it is not an official service that RWTH Aachen
provides support for.
.. include:: ../docs/tutorial.md
.. include:: ../docs/examples.md
"""
###############################################################################
# Dependencies
###############################################################################
from .config import Config
from .logger import Logger, LogLevel
from .exceptions import *
from .client import Client
from .project import Project, ProjectMember, ProjectForm
from .resource import Resource, ResourceForm
from .object import FileObject, MetadataForm
from .graph import ApplicationProfile
###############################################################################
Sub-modules
coscine.cache
-
This file implements a basic persistent cache for nonvolatile Coscine data such as metadata vocabularies. The data is automatically refreshed every …
coscine.client
-
This file contains the backbone of the Coscine Python SDK - the client class. The client class acts as the manager of the SDK and is mainly …
coscine.config
-
This file provides an easy way of reading coscine-python-sdk config files.
coscine.defaults
-
This file defines default and constant data internally used by multiple modules to avoid redefinitions.
coscine.exceptions
-
This file defines all of the exceptions raised by the Coscine Python SDK. The base exception class is called CoscineException. It directly inherits …
coscine.form
-
This file provides base class for all input forms defined by the Coscine Python SDK.
coscine.graph
-
This file provides a simple wrapper around Coscine application profiles. It abstracts the interaction with rdf graphs using rdflib and provides an …
coscine.logger
-
This file provides a simple logger internally used by the Coscine Python SDK. The logger is capable of printing information to a specified file …
coscine.object
-
Implements classes and routines for manipulating Metadata and interacting with files and file-like data in Coscine.
coscine.project
-
This file defines the project object for the representation of Coscine projects. It provides a simple interface to interact with Coscine projects from …
coscine.resource
-
This file defines the resource object for the representation of Coscine resources. It provides an easy interface to interact with Coscine resources …
coscine.utils
-
This file contains utility classes and functions, mostly taken from another source like StackOverflow. Credit is given where it is due.
coscine.vocabulary
-
This file implements various classes for querying, parsing and interacting with data inside Coscine vocabularies.