Difference between revisions of "CETAF Specimen Catalogue"

From CETAF Identifiers Wiki
Jump to: navigation, search
 
(11 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The motivation for setting up the CETAF Specimen Catalogue maintained in Berlin is to provide one single access point for linked open data published by CETAF institutions. By merging semantically annotated specimen data from different sources into a single triple store that is accessible through a SPARQL interface, we hope to facilitate new usages of the data such as linkages between specimens or collectors.
+
The motivation for setting up the CETAF Specimen Catalogue is to provide one single access point for linked open data published by CETAF institutions. By merging semantically annotated specimen data from different sources into a single triple store that is accessible through a SPARQL interface, we hope to facilitate new usages of the data such as linkages between specimens or collectors.
  
 
Specimen data from an institution can be incorporated into the catalogue, if
 
Specimen data from an institution can be incorporated into the catalogue, if
Line 5: Line 5:
 
# the CETAF ID is published to [http://www.gbif.org GBIF] as a {{abbr|GUID|Globally Unique Identifier}} ([http://rs.tdwg.org/dwc/terms/occurrenceID occurrence ID]).
 
# the CETAF ID is published to [http://www.gbif.org GBIF] as a {{abbr|GUID|Globally Unique Identifier}} ([http://rs.tdwg.org/dwc/terms/occurrenceID occurrence ID]).
  
[[File:CETAF Specimen Catalogue.jpg]]
 
  
Square one for the Catalogue is the GBIF Index downloaded as a zip file bi- or trimonthly. After importing major elements of the index a SQL Server database, the CETAF IDs of the 14 partners of the [https://cetaf.org/cetaf-stable-identifiers Stable Identifiers Implementers Group] will be extracted into the CETAF ID Catalogue (hence the requirement for the institutions to use it as a [http://rs.tdwg.org/dwc/terms/occurrenceID GBIF occurrence ID]). This will happen regardless of the implementation level; also IDs that only resolve to a human-readable representation of the object will be included. This list of IDs (22 million as of 23rd of January 2020) will be made accessible through a simple web service (implementation pending).
+
[[File:CETAF Specimen Catalogue.jpg|center]]
  
For the institutions supporting Level 2 of the CETAF Identifier system (currently 6 partners), the RDF documents for the identifiers found in the ID Catalogue will be harvested and imported into a triple store ([https://jena.apache.org/ Apache Jena]). This {{abbr|LOD|Linked Open Data}} cache will be made accessible through a SPARQL access point (still pending). When using this access point, please keep in mind that harvesting takes place in similar intervals as the ID Catalogue is created from the GBIF Index (roughly every 2-3 months). Also, the number of specimens in the RDF Cache is lower than the number of IDs in the ID Catalogue, since not all institutions provide RDF representations of their specimens.
+
 
 +
Square one for the Catalogue is the GBIF Index downloaded as a zip file bi- or trimonthly. After importing major elements of the index a SQL Server database, the CETAF IDs of the [http://herbal.rbge.info/md.php?q=implementers partners] of the [https://cetaf.org/cetaf-stable-identifiers Stable Identifiers Implementers Group] will be extracted into the CETAF ID Catalogue (hence the requirement for the institutions to use it as a [http://rs.tdwg.org/dwc/terms/occurrenceID GBIF occurrence ID]). This will happen regardless of the implementation level; also IDs that only resolve to a human-readable representation of the object will be included. This list of IDs (20 million as of 23rd of January 2020) will be made accessible through a simple web service (implementation pending).
 +
 
 +
For the institutions supporting Level 2 of the CETAF Identifier system (currently 6 partners), the RDF documents for the identifiers found in the ID Catalogue will be harvested and imported into a triple store ([https://jena.apache.org/ Apache Jena]). This {{abbr|LOD|Linked Open Data}} can be accessed through a SPARQL access point (please [mailto:biodiversitydata@bgbm.org get in contact] for the URL if you intend to use it). When using this access point, please keep in mind that harvesting takes place in similar intervals as the ID Catalogue is created from the GBIF Index (roughly every 2-3 months). Also, the number of specimens in the RDF Cache is lower than the number of IDs in the ID Catalogue, since not all institutions provide RDF representations of their specimens.
 +
 
 +
'''Important note for Institutions using CETAF IDs'''
 +
 
 +
If an institution is using CETAF IDs and wants them (and potential Specimen RDF) to be included into the CETAF Specimen Catalogue, they need to be used as GUIDs in the specimen data fed to GBIF. As described above, the GBIF Index is used to discover CETAF IDs.
 +
* If DarwinCore is used, the IDs must be mapped to [http://rs.tdwg.org/dwc/terms/occurrenceID occurrence ID].
 +
* For ABCD, the concept [https://terms.tdwg.org/wiki/abcd2:UnitGUID UnitGUID] should be used.
 +
 
 +
[[Category: Guide for CETAF Stable Identifiers]]

Latest revision as of 15:19, 10 June 2021

The motivation for setting up the CETAF Specimen Catalogue is to provide one single access point for linked open data published by CETAF institutions. By merging semantically annotated specimen data from different sources into a single triple store that is accessible through a SPARQL interface, we hope to facilitate new usages of the data such as linkages between specimens or collectors.

Specimen data from an institution can be incorporated into the catalogue, if

  1. it is published as RDF documents using the CETAF Specimen Preview Profile (CSPP), and
  2. the CETAF ID is published to GBIF as a GUID (occurrence ID).


CETAF Specimen Catalogue.jpg


Square one for the Catalogue is the GBIF Index downloaded as a zip file bi- or trimonthly. After importing major elements of the index a SQL Server database, the CETAF IDs of the partners of the Stable Identifiers Implementers Group will be extracted into the CETAF ID Catalogue (hence the requirement for the institutions to use it as a GBIF occurrence ID). This will happen regardless of the implementation level; also IDs that only resolve to a human-readable representation of the object will be included. This list of IDs (20 million as of 23rd of January 2020) will be made accessible through a simple web service (implementation pending).

For the institutions supporting Level 2 of the CETAF Identifier system (currently 6 partners), the RDF documents for the identifiers found in the ID Catalogue will be harvested and imported into a triple store (Apache Jena). This LOD can be accessed through a SPARQL access point (please get in contact for the URL if you intend to use it). When using this access point, please keep in mind that harvesting takes place in similar intervals as the ID Catalogue is created from the GBIF Index (roughly every 2-3 months). Also, the number of specimens in the RDF Cache is lower than the number of IDs in the ID Catalogue, since not all institutions provide RDF representations of their specimens.

Important note for Institutions using CETAF IDs

If an institution is using CETAF IDs and wants them (and potential Specimen RDF) to be included into the CETAF Specimen Catalogue, they need to be used as GUIDs in the specimen data fed to GBIF. As described above, the GBIF Index is used to discover CETAF IDs.

  • If DarwinCore is used, the IDs must be mapped to occurrence ID.
  • For ABCD, the concept UnitGUID should be used.