CETAF Specimen Catalogue

From CETAF Identifiers Wiki
Revision as of 16:19, 10 June 2021 by Jörg Holetschek (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The motivation for setting up the CETAF Specimen Catalogue is to provide one single access point for linked open data published by CETAF institutions. By merging semantically annotated specimen data from different sources into a single triple store that is accessible through a SPARQL interface, we hope to facilitate new usages of the data such as linkages between specimens or collectors.

Specimen data from an institution can be incorporated into the catalogue, if

  1. it is published as RDF documents using the CETAF Specimen Preview Profile (CSPP), and
  2. the CETAF ID is published to GBIF as a GUID (occurrence ID).

CETAF Specimen Catalogue.jpg

Square one for the Catalogue is the GBIF Index downloaded as a zip file bi- or trimonthly. After importing major elements of the index a SQL Server database, the CETAF IDs of the partners of the Stable Identifiers Implementers Group will be extracted into the CETAF ID Catalogue (hence the requirement for the institutions to use it as a GBIF occurrence ID). This will happen regardless of the implementation level; also IDs that only resolve to a human-readable representation of the object will be included. This list of IDs (20 million as of 23rd of January 2020) will be made accessible through a simple web service (implementation pending).

For the institutions supporting Level 2 of the CETAF Identifier system (currently 6 partners), the RDF documents for the identifiers found in the ID Catalogue will be harvested and imported into a triple store (Apache Jena). This LOD can be accessed through a SPARQL access point (please get in contact for the URL if you intend to use it). When using this access point, please keep in mind that harvesting takes place in similar intervals as the ID Catalogue is created from the GBIF Index (roughly every 2-3 months). Also, the number of specimens in the RDF Cache is lower than the number of IDs in the ID Catalogue, since not all institutions provide RDF representations of their specimens.

Important note for Institutions using CETAF IDs

If an institution is using CETAF IDs and wants them (and potential Specimen RDF) to be included into the CETAF Specimen Catalogue, they need to be used as GUIDs in the specimen data fed to GBIF. As described above, the GBIF Index is used to discover CETAF IDs.

  • If DarwinCore is used, the IDs must be mapped to occurrence ID.
  • For ABCD, the concept UnitGUID should be used.