coscine-elabftw-connection

Linking Data Folders in Coscine to Experiments in eLabFTW

This repo contains scripts that can assist you in mirroring metadata recorded in eLabFTW on folders and files in Coscine.

Please note, this is a work in progress!

Here's an overview of the entities and their relationships, including the user interaction:


flowchart TD
 subgraph s1["eLabFTW"]
        A["experiment"]
        B["Metadata"]
        n1["basic information (authors, date, ...)"]
        C["name"]
        n2["experiment/domain-specific metadata<br>(extra fields)"]
        n3["linked resource<br>(categorie = Coscine: Resource)"]
        n4["extra fields"]
        n7["free-free text field"]
        n5["Coscine project name"]
        n6["Coscine resource name"]
        n8["links to folders in Coscine"]
  end
 subgraph s2["data files belonging to experiment"]
        n9["data directory"]
        n10["name"]
  end
 subgraph s3["Coscine"]
        n11["S3 Resource"]
        n12["meatdata Profile"]
        n13["Base ELN Profile"]
        n14["Base experiment/domain-speciic profile"]
  end
    A -- <br> --> B
    B --> n1 & C & n2 & n3
    n3 --> n4 & n7
    n4 --> n5 & n6
    n7 --> n8
    n9 --> n10
    n9 -.-> n16["S3 Client"]
    n10 -- sameAs --> C
    n11 --> n12
    n12 --> n13 & n14
    n15(["User"]) -. uploads .-> n9
    n16 -.-> n11
    n14 -. mapped via python code .- n2
    n13 -. mapped via python code .- n1
    style n15 fill:#D50000
    style s1 fill:#C8E6C9,color:#000000
    style s3 fill:#BBDEFB
    linkStyle 11 stroke:#D50000,fill:none
    linkStyle 16 stroke:#D50000,fill:none
    linkStyle 17 stroke:#D50000,fill:none

Python Workflow

The below shows is a general flowchart for the Python code:


---
config:
  layout: dagre
  look: classic
---
flowchart TB
    n1["retrieve experiments with linked coscine resources"] -- <br> --> n2["iterate over experiments"]
    n2 --> n3["Experiment metadata in Coscine?<br>(info in extra fields)"]
    n3 -- False --> n4["get Coscine project name and resource name from linked ELN resource extra fields"]
    n3 -- True --> n2
    n4 --> n5["connect to Coscine resource"]
    n5 --> n6["Data in Coscine?<br>(exp name = folder name)"]
    n6 -- False --> n2
    n6 -- True --> n7["Download and unpack .ELN from experiemnt"]
    n7 --> n8["Extract and map metadata from rocrate-metadata.json to Base ELN Metadata Profile from Coscine"]
    n8 --> n9["Assign Base ELN metadata to Folder in Coscine"]
    n9 --> n10["TO DO: Extract and map extra fields metadata and assign to custom part of profile in Coscine"]
    n10 --> n11["Delete downloaded files"]
    n11 --> n12["Check box in experiment indicating metadata in coscine<br>(TO DO: replace with writing download link to extra field)"]
    n12 --> n13["Write download link to free-text field in ELN coscine resource"]
    n13 --> n2
    style n1 fill:#FF6D00
    style n10 stroke:#757575,stroke-width:2px,stroke-dasharray: 2,color:#616161

Setup

The setup instructions below encompass Coscine, eLabFTW, as well as the GitLab pipeline to run the Python code.

Python Code/Gitlab Pipelines

  1. Fork this repository. Ensure you have runners available for your project.
  2. Alter the pipeline to fit your runner tags.
  3. Setup the required variables in your CI/CD settings. You will need:
  • COS_API_KEY
  • COS_BASE_URL
  • ELABTEST_API_HOST_URL
  • ELABTEST_API_KEY Ensure these are sufficiently protected by masking the keys.
  1. Setup a schedule for the CI/CD pipeline, depending on your needs.

note: You could also clone this repository to a virtual machine and run it using a cron job or similar.

Coscine

  1. Setup an S3 resources (we will call these buckets to avoid confusion between Coscine resources and eLabFTW resources) for your data. If you do not have quota, please request it (see the documentation). Use the ELN Base Profile for your metadata. You may also extend this metadata. Please note, the mapping is a folder within this bucket = data for an experiment in eLabFTW. Therefore, you should have one bucket for a certain category of data. The metadata profile (form) will be identical for each file or folder within this bucket.

eLabFTW

  1. In eLabFTW, you must have a resource category Coscine: resource with the folliwing extra fields:
{
  "extra_fields": {
    "Project PID": {
      "type": "url",
      "value": ""
    },
    "Project Name": {
      "type": "text",
      "value": "",
      "required": true
    },
    "Resource PID": {
      "type": "url",
      "value": ""
    },
    "Resource URL": {
      "type": "url",
      "value": ""
    },
    "Resource Name": {
      "type": "text",
      "value": "",
      "required": true
    }
  }
}

Thiese should be filled out with te bucket information, available in the setting for each coscine bucket.

  1. For each S3 Bucket in Coscine create a matching resource in eLabFTW. The name should match the resource name in Coscine (but this is also covered in the extra fields, as shown above).
  2. For each experiment that you want to link data to, link the coscine resource it will be in. Use experiment templates to ensure you do not forget this.
  3. There must be an extra field in the experiment you want to link data to called Data folder. This will be automatically filled and serves as the check whether the metadata has already been mirrored in Coscine. Do not fill manually. Include this empty extra field in your experiment template so you do not forget this.

S3 Client

  1. Connect to your Coscine buckets using an S3 client. (Hint: save these as bookmark if your client allows.)
  2. Upload your data folder to the S3 bucket in Coscine using an S3 client (Note: Coscine's web UI won't work since you will have to manually add metadata.) Ensure the folder name is identical to the experiment it belongs to.

Custom Metadata

To mirror domain-specific metadata contained in the extra fields, you can extend the Base Profile: Data belonging to ELN entries to include these specific fields. This can be done one the AIM metadata profile generator Depending on the complexity, you may need to add a parser to the code. For simple key:value profiles, as long as the field names match, these will be mirrored.

Possible Extensions and Alternatives

You could build upon this code and automatically extract metadata from files and fill metadata in the ELN.

Since there are common issues with folders in Coscine, an alternative may be to create a resource per experiment. The resource creation and linking could happen automatically, both in Coscine and eLabFTW, and the data could be uploaded from a specified location. The downside: you have a lot of smaller resources (S3 buckets) in Coscine.

Considerations for Direct Implementation

Coscine

  • new ELN resource type that automatically mirrors metadata and adds ELN files
    • setup includes access token
    • view HTM preview
  • mirroring triggered as soon as data added

eLabFTW

  • Option to setup storage with (coscine) api keys (or similar for other options)
  • Drag and drop function to a defined storage (bucket) in experiment
  • changes to metadata triggers mirroring pipeline (or it just runs at regular intervals)
  • Incorperate linked data files using link in .ELN file