Difference between revisions of "CETAF Specimen Catalogue"
m |
|||
Line 9: | Line 9: | ||
− | Square one for the Catalogue is the GBIF Index downloaded as a zip file bi- or trimonthly. After importing major elements of the index a SQL Server database, the CETAF IDs of the [http://herbal.rbge.info/md.php?q=implementers partners] of the [https://cetaf.org/cetaf-stable-identifiers Stable Identifiers Implementers Group] will be extracted into the CETAF ID Catalogue (hence the requirement for the institutions to use it as a [http://rs.tdwg.org/dwc/terms/occurrenceID GBIF occurrence ID]). This will happen regardless of the implementation level; also IDs that only resolve to a human-readable representation of the object will be included. This list of IDs ( | + | Square one for the Catalogue is the GBIF Index downloaded as a zip file bi- or trimonthly. After importing major elements of the index a SQL Server database, the CETAF IDs of the [http://herbal.rbge.info/md.php?q=implementers partners] of the [https://cetaf.org/cetaf-stable-identifiers Stable Identifiers Implementers Group] will be extracted into the CETAF ID Catalogue (hence the requirement for the institutions to use it as a [http://rs.tdwg.org/dwc/terms/occurrenceID GBIF occurrence ID]). This will happen regardless of the implementation level; also IDs that only resolve to a human-readable representation of the object will be included. This list of IDs (20 million as of 23rd of January 2020) will be made accessible through a simple web service (implementation pending). |
For the institutions supporting Level 2 of the CETAF Identifier system (currently 6 partners), the RDF documents for the identifiers found in the ID Catalogue will be harvested and imported into a triple store ([https://jena.apache.org/ Apache Jena]). This {{abbr|LOD|Linked Open Data}} cache will be made accessible through a SPARQL access point (still pending). When using this access point, please keep in mind that harvesting takes place in similar intervals as the ID Catalogue is created from the GBIF Index (roughly every 2-3 months). Also, the number of specimens in the RDF Cache is lower than the number of IDs in the ID Catalogue, since not all institutions provide RDF representations of their specimens. | For the institutions supporting Level 2 of the CETAF Identifier system (currently 6 partners), the RDF documents for the identifiers found in the ID Catalogue will be harvested and imported into a triple store ([https://jena.apache.org/ Apache Jena]). This {{abbr|LOD|Linked Open Data}} cache will be made accessible through a SPARQL access point (still pending). When using this access point, please keep in mind that harvesting takes place in similar intervals as the ID Catalogue is created from the GBIF Index (roughly every 2-3 months). Also, the number of specimens in the RDF Cache is lower than the number of IDs in the ID Catalogue, since not all institutions provide RDF representations of their specimens. |
Revision as of 14:12, 20 April 2020
The motivation for setting up the CETAF Specimen Catalogue is to provide one single access point for linked open data published by CETAF institutions. By merging semantically annotated specimen data from different sources into a single triple store that is accessible through a SPARQL interface, we hope to facilitate new usages of the data such as linkages between specimens or collectors.
Specimen data from an institution can be incorporated into the catalogue, if
- it is published as RDF documents using the CETAF Specimen Preview Profile (CSPP), and
- the CETAF ID is published to GBIF as a GUID (occurrence ID).
Square one for the Catalogue is the GBIF Index downloaded as a zip file bi- or trimonthly. After importing major elements of the index a SQL Server database, the CETAF IDs of the partners of the Stable Identifiers Implementers Group will be extracted into the CETAF ID Catalogue (hence the requirement for the institutions to use it as a GBIF occurrence ID). This will happen regardless of the implementation level; also IDs that only resolve to a human-readable representation of the object will be included. This list of IDs (20 million as of 23rd of January 2020) will be made accessible through a simple web service (implementation pending).
For the institutions supporting Level 2 of the CETAF Identifier system (currently 6 partners), the RDF documents for the identifiers found in the ID Catalogue will be harvested and imported into a triple store (Apache Jena). This LOD cache will be made accessible through a SPARQL access point (still pending). When using this access point, please keep in mind that harvesting takes place in similar intervals as the ID Catalogue is created from the GBIF Index (roughly every 2-3 months). Also, the number of specimens in the RDF Cache is lower than the number of IDs in the ID Catalogue, since not all institutions provide RDF representations of their specimens.
Important note for Institutions using CETAF IDs
If an institution is using CETAF IDs and wants them (and potential Specimen RDF) to be included into the CETAF Specimen Catalogue, they need to be used as GUIDs in the specimen data fed to GBIF. As described above, the GBIF Index is used to discover CETAF IDs.
- If DarwinCore is used, the IDs must be mapped to occurrence ID.
- For ABCD, the concept UnitGUID should be used.