Zenodo harvester: include only latest version on documents
The current Zenodo harvester - based on sickle
which uses the oai_dc
profile - harvests each Zenodo document as a separate article, even if they are versions of the same document. The results is that if you currently search for https://nfdi4earth-knowledgehub.geo.tu-dresden.de/api/#objects/?query=type%3A%20Article%20AND%20%22World%20Settlement%20Footprint%22 in the KH, you will get 3 different articles, which all refer to the same Zenodo article: https://doi.org/10.5281/zenodo.7858700
Zenodo mints a doi for every article version and additionally one doi (they call it conceptdoi
) which always redirects to the latest version.
Unfortunately the oai_dc
profile gives us no option to unambiguously determine the conceptdoi
and other versions from the metadata.
The OAI-PMH oai_datacite
profile would have this information (e.g., https://zenodo.org/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:zenodo.org:7897514 ) or it could be retrieved via the Zenodo API (see corresponding issue #51 (closed)).
My suggestion is that we only harvest the latest version of a Zenodo article, and use the conceptdoi
as the n4e:sourceSystemID
. The other option would be to harvest all versions, and link them accordingly, but I think this would fill the KH with unnecessary metadata. @JohnDOEbug @awellmann-lrz @daniel.nuest what do you think?
Relates to #26