Zenodo harvesting based on `sickle` is missing out on some information
The zenodo harvester currently makes use of the OAI-PMH interface of Zenodo and is based on the Python package sickle.
However sickle is optimized for harvesting OAI-PMH resources based on the Dublin Core profile (metadataPrefix=oai_dc
), which includes much less metadata then the datacite profile, see e.g.:
https://zenodo.org/oai2d?verb=ListRecords&set=user-nfdi4earth&metadataPrefix=oai_dc
https://zenodo.org/oai2d?verb=ListRecords&set=user-nfdi4earth&metadataPrefix=datacite
We want to harvest for example whether an article is part of the NFDI4Earth community, in the datacite profile this is:
<relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf" >https://zenodo.org/communities/nfdi4earth</relatedIdentifier>
while in oai_dc this is only represented as:
<dc:relation>url:https://zenodo.org/communities/nfdi4earth</dc:relation>
(the qualifier of the relation
is missing so that it cannot be distinguished from other statements (e.g. isVersionOf) which also use dc:relation
.
Furthermore we want to get the ORCID's of the authors not only the names, this is also not included in oai_dc
.
I would therefore say that we move away from sickle
and use another tool which harvests all metadata that zenodo offers. Any suggestions?