Skip to content
Snippets Groups Projects

Load metadata from coscine if not in cache

Merged Niklas Siemer requested to merge Getting_metadata_without_cache into master
1 unresolved thread

Currently, the actual metadata of a file object seems not to be loaded anywhere, except one forces it in this method. However, if the metadata is not already in the cache and I ask for it on the file object, I would expect it to be loaded from coscine. Therefore, I deleted the return None such that a loading is attempted.

A question on the cache: Currently, I do not see a method to easily populate the cache with the metadata in one directory. As such, the nice .dataframe() method only provides a more or less empty DataFrame - calling the dataframe should ideally result in a 'get all metadata into cache and load DataFrame with this data' request, IMHO.

Merge request reports

Checking pipeline status.

Approved by

Merged by Niklas SiemerNiklas Siemer 2 years ago (Mar 20, 2023 6:50am UTC)

Merge details

  • Changes merged into master with f7702096.
  • Deleted the source branch.

Pipeline #942289 skipped

Pipeline skipped for f7702096 on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Romin approved this merge request

    approved this merge request

  • Makes sense, nice catch!

    A question on the cache: Currently, I do not see a method to easily populate the cache with the metadata in one directory. As such, the nice .dataframe() method only provides a more or less empty DataFrame - calling the dataframe should ideally result in a 'get all metadata into cache and load DataFrame with this data' request, IMHO.

    I don't think I understand this. The Resource.objects() method tags all FileObjects with metadata. The metadata is served in a single request together with the file handles in one big JSON struct. So it should only be a single request per dataframe.

  • Feel free to merge this! :smile:

  • Niklas Siemer mentioned in commit f7702096

    mentioned in commit f7702096

  • merged

    • Thanks for the review!

      Now I am a bit confused. I first got all my FileObjects via Res.objects() and afterwards operated on an entry in that list. That did not have metadata and I forced the loading. Afterwards, I tested the dataframe method and only that single force loaded FileObject had all the metadata. Hence, the loading of all metadata did not happen in my case?

    • Now I understand. The FileObjects all have a local metadata cache, which I got confused with the global http request cache in cache.py. When a FileObject is created, its metadata can be passed to its constructor. However, since the Coscine API is horrible to work with, one has to first somehow map the filestorage information to the metadata information. Coscine sends out a JSON with a dict for all filestorage information containing filenames, file sizes, etc and a dict for all metadata info. But not with equivalent keys, noooo! Also not really as a dict, but both are lists! :nauseated_face:
      So in resource.objects() I just loop through all the info in filestorage to gather all files and then for each of these files I loop through the metadatastorage list/dict in an attempt to set its metadata to avoid more calls to the metadata endpoint, because the Coscine API is also horribly slow. But it seems like the API was yet again changed in some regard, so that this mapping does not work anymore. Previously the metadatastorage contained keys with @path=...filename... at the end, but now they made it even more complicated, go figure... -_-
      Now its "https://purl.org/coscine/resources/a3bc2a1c-bb09-4c50-8958-18acbed0d72b/testfile2.txt/@type=metadata&version=1674189299" for a file testfile2.txt in one of my resources. This obviously breaks all the metadata caching i originally implemented.
      To fix this the Resource.objects() method needs to be altered, more specifically:

       for data in file_storage:
                  path = urllib.parse.quote("/" + data["Path"], safe="")
                  metadata: dict = {}
                  for meta in metadata_storage:
                      if list(meta.keys())[0].endswith(f"@path={path}"):   # <<< this doesn't  work anymore
                          metadata = meta
                          break
                  objects.append(FileObject(self, data, metadata))

      I'll do it when I get back from my leave end of march or leave it up to eager contributers to pick this up themselves... ;)

    • For reference here is an overview over file_storage and metadata_storage as they appear in the snippet above for some test resource:

      # print(json.dumps(metadata_storage, indent=4)):
      [
          {
              "Name": "50mT-10gps-66fps-3600-123-0005.txt",
              "Path": "50mT-10gps-66fps-3600-123-0005.txt",
              "Size": 31,
              "Kind": "file",
              "Modified": "2023-01-02T14:32:29.648+01:00",
              "Created": "2023-01-02T14:32:29.648+01:00",
              "Provider": "rdss3",
              "IsFolder": false,
              "IsFile": true,
              "Action": {
                  "Delete": {
                      "Method": "DELETE",
                      "Url": "https://coscine-s3-01.s3.fds.rwth-aachen.de:9021/1d0056a0-8d23-4332-9465-909222346a36/50mT-10gps-66fps-3600-123-0005.txt?AWSAccessKeyId=coscine-s3-object-admin&Expires=1679410672&Signature=tHuApW1wKX7qTAx3qcTeeVX9Urc%3D"
                  },
                  "Download": {
                      "Method": "GET",
                      "Url": "https://coscine-s3-01.s3.fds.rwth-aachen.de:9021/1d0056a0-8d23-4332-9465-909222346a36/50mT-10gps-66fps-3600-123-0005.txt?AWSAccessKeyId=coscine-s3-object-admin&Expires=1679410672&Signature=bv77zDt40h4aa%2F42HDj3Xhqd2XA%3D"
                  },
                  "Upload": {
                      "Method": "PUT",
                      "Url": "https://coscine-s3-01.s3.fds.rwth-aachen.de:9021/1d0056a0-8d23-4332-9465-909222346a36/50mT-10gps-66fps-3600-123-0005.txt?AWSAccessKeyId=coscine-s3-object-admin&Expires=1679410672&Signature=tHuApW1wKX7qTAx3qcTeeVX9Urc%3D"
                  }
              }
          },
          {
              "Name": "50mT-10gps-66fps-3600\u00b0-123-0005.txt",
              "Path": "50mT-10gps-66fps-3600\u00b0-123-0005.txt",
              "Size": 31,
              "Kind": "file",
              "Modified": "2023-01-02T15:57:06.51+01:00",
              "Created": "2023-01-02T15:57:06.51+01:00",
              "Provider": "rdss3",
              "IsFolder": false,
              "IsFile": true,
              "Action": {
                  "Delete": {
                      "Method": "DELETE",
                      "Url": "https://coscine-s3-01.s3.fds.rwth-aachen.de:9021/1d0056a0-8d23-4332-9465-909222346a36/50mT-10gps-66fps-3600\u00b0-123-0005.txt?AWSAccessKeyId=coscine-s3-object-admin&Expires=1679410672&Signature=dSp%2FE5QYBrIhFXArD6lzAeURvw0%3D"
                  },
                  "Download": {
                      "Method": "GET",
                      "Url": "https://coscine-s3-01.s3.fds.rwth-aachen.de:9021/1d0056a0-8d23-4332-9465-909222346a36/50mT-10gps-66fps-3600\u00b0-123-0005.txt?AWSAccessKeyId=coscine-s3-object-admin&Expires=1679410672&Signature=FQILv22t9mX7OSmGC6Yz03XMk%2Bc%3D"
                  },
                  "Upload": {
                      "Method": "PUT",
                      "Url": "https://coscine-s3-01.s3.fds.rwth-aachen.de:9021/1d0056a0-8d23-4332-9465-909222346a36/50mT-10gps-66fps-3600\u00b0-123-0005.txt?AWSAccessKeyId=coscine-s3-object-admin&Expires=1679410672&Signature=dSp%2FE5QYBrIhFXArD6lzAeURvw0%3D"
                  }
              }
          },
          {
              "Name": "config.json",
              "Path": "config.json",
              "Size": 469,
              "Kind": "file",
              "Modified": "2023-01-02T14:24:10.977+01:00",
              "Created": "2023-01-02T14:24:10.977+01:00",
              "Provider": "rdss3",
              "IsFolder": false,
              "IsFile": true,
              "Action": {
                  "Delete": {
                      "Method": "DELETE",
                      "Url": "https://coscine-s3-01.s3.fds.rwth-aachen.de:9021/1d0056a0-8d23-4332-9465-909222346a36/config.json?AWSAccessKeyId=coscine-s3-object-admin&Expires=1679410672&Signature=WYvkOC%2FqaygkV1oaHL8YRoTWwC8%3D"
                  },
                  "Download": {
                      "Method": "GET",
                      "Url": "https://coscine-s3-01.s3.fds.rwth-aachen.de:9021/1d0056a0-8d23-4332-9465-909222346a36/config.json?AWSAccessKeyId=coscine-s3-object-admin&Expires=1679410672&Signature=SK%2FA11d9YtzUji8xVLdC%2FOXWpcE%3D"
                  },
                  "Upload": {
                      "Method": "PUT",
                      "Url": "https://coscine-s3-01.s3.fds.rwth-aachen.de:9021/1d0056a0-8d23-4332-9465-909222346a36/config.json?AWSAccessKeyId=coscine-s3-object-admin&Expires=1679410672&Signature=WYvkOC%2FqaygkV1oaHL8YRoTWwC8%3D"
                  }
              }
          }
      ]
      # print(json.dumps(metadata_storage, indent=4)):
      [
          {
              "https://hdl.handle.net/21.11102/1d0056a0-8d23-4332-9465-909222346a36@path=%2F50mT-10gps-66fps-3600%C2%B0-123-0005.txt": {
                  "http://purl.allotrope.org/ontologies/result#AFR_0000952e": [
                      {
                          "value": "2023-01-02T14:22:50",
                          "datatype": "http://www.w3.org/2001/XMLSchema#dateTime",
                          "type": "literal"
                      }
                  ],
                  "http://purl.allotrope.org/ontologies/result#AFR_0001118": [
                      {
                          "value": "DB-B5-SFL-16",
                          "datatype": "http://www.w3.org/2001/XMLSchema#string",
                          "type": "literal"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/AFR_0000954": [
                      {
                          "value": "10",
                          "datatype": "http://www.w3.org/2001/XMLSchema#decimal",
                          "type": "literal"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/CHEBI_46787": [
                      {
                          "value": "water",
                          "datatype": "http://www.w3.org/2001/XMLSchema#string",
                          "type": "literal"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/CHMO_0000947": [
                      {
                          "value": "http://purl.org/coscine/vocabularies/sfb985/microscope#0",
                          "type": "uri"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/CHMO_0001301": [
                      {
                          "value": "http://purl.org/coscine/vocabularies/sfb985/productionMethod#1",
                          "type": "uri"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/OBI_0001048": [
                      {
                          "value": "http://purl.org/coscine/vocabularies/sfb985/camera#0",
                          "type": "uri"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/PATO_0001599": [
                      {
                          "value": "3600",
                          "datatype": "http://www.w3.org/2001/XMLSchema#decimal",
                          "type": "literal"
                      }
                  ],
                  "http://purl.org/dc/terms/creator": [
                      {
                          "value": "Dominik Braunmiller",
                          "datatype": "http://www.w3.org/2001/XMLSchema#string",
                          "type": "literal"
                      }
                  ],
                  "http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#frameRate": [
                      {
                          "value": "66",
                          "datatype": "http://www.w3.org/2001/XMLSchema#decimal",
                          "type": "literal"
                      }
                  ],
                  "http://www.ontology-of-units-of-measure.org/resource/om-2/MagneticField": [
                      {
                          "value": "50",
                          "datatype": "http://www.w3.org/2001/XMLSchema#decimal",
                          "type": "literal"
                      }
                  ],
                  "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
                      {
                          "value": "https://purl.org/coscine/ap/sfb985/Microscopy_RotMagField/",
                          "type": "uri"
                      }
                  ],
                  "https://w3id.org/reproduceme#Experiment": [
                      {
                          "value": "28mT-19072021_4",
                          "datatype": "http://www.w3.org/2001/XMLSchema#string",
                          "type": "literal"
                      }
                  ],
                  "https://w3id.org/reproduceme#Objective": [
                      {
                          "value": "http://purl.org/coscine/vocabularies/sfb985/objectiveLens#0",
                          "type": "uri"
                      }
                  ]
              }
          },
          {
              "https://purl.org/coscine/resources/1d0056a0-8d23-4332-9465-909222346a36/50mT-10gps-66fps-3600%C2%B0-123-0005.txt/@type=metadata&version=1672880848": {
                  "http://purl.allotrope.org/ontologies/result#AFR_0000952e": [
                      {
                          "value": "2023-01-02T14:22:50",
                          "datatype": "http://www.w3.org/2001/XMLSchema#dateTime",
                          "type": "literal"
                      }
                  ],
                  "http://purl.allotrope.org/ontologies/result#AFR_0001118": [
                      {
                          "value": "DB-B5-SFL-16",
                          "datatype": "http://www.w3.org/2001/XMLSchema#string",
                          "type": "literal"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/AFR_0000954": [
                      {
                          "value": "10",
                          "datatype": "http://www.w3.org/2001/XMLSchema#decimal",
                          "type": "literal"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/CHEBI_46787": [
                      {
                          "value": "water",
                          "datatype": "http://www.w3.org/2001/XMLSchema#string",
                          "type": "literal"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/CHMO_0000947": [
                      {
                          "value": "http://purl.org/coscine/vocabularies/sfb985/microscope#0",
                          "type": "uri"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/CHMO_0001301": [
                      {
                          "value": "http://purl.org/coscine/vocabularies/sfb985/productionMethod#1",
                          "type": "uri"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/OBI_0001048": [
                      {
                          "value": "http://purl.org/coscine/vocabularies/sfb985/camera#0",
                          "type": "uri"
                      }
                  ],
                  "http://purl.obolibrary.org/obo/PATO_0001599": [
                      {
                          "value": "3600",
                          "datatype": "http://www.w3.org/2001/XMLSchema#decimal",
                          "type": "literal"
                      }
                  ],
                  "http://purl.org/dc/terms/creator": [
                      {
                          "value": "Dominik Braunmiller",
                          "datatype": "http://www.w3.org/2001/XMLSchema#string",
                          "type": "literal"
                      }
                  ],
                  "http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#frameRate": [
                      {
                          "value": "66",
                          "datatype": "http://www.w3.org/2001/XMLSchema#decimal",
                          "type": "literal"
                      }
                  ],
                  "http://www.ontology-of-units-of-measure.org/resource/om-2/MagneticField": [
                      {
                          "value": "50",
                          "datatype": "http://www.w3.org/2001/XMLSchema#decimal",
                          "type": "literal"
                      }
                  ],
                  "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
                      {
                          "value": "https://purl.org/coscine/ap/sfb985/Microscopy_RotMagField/",
                          "type": "uri"
                      }
                  ],
                  "https://w3id.org/reproduceme#Experiment": [
                      {
                          "value": "28mT-19072021_4",
                          "datatype": "http://www.w3.org/2001/XMLSchema#string",
                          "type": "literal"
                      }
                  ],
                  "https://w3id.org/reproduceme#Objective": [
                      {
                          "value": "http://purl.org/coscine/vocabularies/sfb985/objectiveLens#0",
                          "type": "uri"
                      }
                  ]
              }
          }
      ]
    • Which is extremely intersting, because here@path IS STILL INCLUDED! But only for one entry, the other entry is different! WTF!?! :confused:

    • Please register or sign in to reply
  • Niklas Siemer mentioned in merge request !12 (closed)

    mentioned in merge request !12 (closed)

  • Romin mentioned in commit 876675e8

    mentioned in commit 876675e8

Please register or sign in to reply
Loading