Difference between revisions of "CETAF Specimen Catalogue"

Revision as of 14:12, 20 April 2020

The motivation for setting up the CETAF Specimen Catalogue is to provide one single access point for linked open data published by CETAF institutions. By merging semantically annotated specimen data from different sources into a single triple store that is accessible through a SPARQL interface, we hope to facilitate new usages of the data such as linkages between specimens or collectors.

Specimen data from an institution can be incorporated into the catalogue, if

it is published as RDF documents using the CETAF Specimen Preview Profile (CSPP), and
the CETAF ID is published to GBIF as a GUID (occurrence ID).

Square one for the Catalogue is the GBIF Index downloaded as a zip file bi- or trimonthly. After importing major elements of the index a SQL Server database, the CETAF IDs of the partners of the Stable Identifiers Implementers Group will be extracted into the CETAF ID Catalogue (hence the requirement for the institutions to use it as a GBIF occurrence ID). This will happen regardless of the implementation level; also IDs that only resolve to a human-readable representation of the object will be included. This list of IDs (20 million as of 23rd of January 2020) will be made accessible through a simple web service (implementation pending).

For the institutions supporting Level 2 of the CETAF Identifier system (currently 6 partners), the RDF documents for the identifiers found in the ID Catalogue will be harvested and imported into a triple store (Apache Jena). This LOD cache will be made accessible through a SPARQL access point (still pending). When using this access point, please keep in mind that harvesting takes place in similar intervals as the ID Catalogue is created from the GBIF Index (roughly every 2-3 months). Also, the number of specimens in the RDF Cache is lower than the number of IDs in the ID Catalogue, since not all institutions provide RDF representations of their specimens.

Important note for Institutions using CETAF IDs

If an institution is using CETAF IDs and wants them (and potential Specimen RDF) to be included into the CETAF Specimen Catalogue, they need to be used as GUIDs in the specimen data fed to GBIF. As described above, the GBIF Index is used to discover CETAF IDs.

If DarwinCore is used, the IDs must be mapped to occurrence ID.
For ABCD, the concept UnitGUID should be used.

Revision as of 11:07, 29 January 2020 (view source) Jörg Holetschek (Talk \| contribs) ← Older edit		Revision as of 14:12, 20 April 2020 (view source) Anton Güntsch (Talk \| contribs) m Newer edit →
Line 9:		Line 9:


−	Square one for the Catalogue is the GBIF Index downloaded as a zip file bi- or trimonthly. After importing major elements of the index a SQL Server database, the CETAF IDs of the [http://herbal.rbge.info/md.php?q=implementers partners] of the [https://cetaf.org/cetaf-stable-identifiers Stable Identifiers Implementers Group] will be extracted into the CETAF ID Catalogue (hence the requirement for the institutions to use it as a [http://rs.tdwg.org/dwc/terms/occurrenceID GBIF occurrence ID]). This will happen regardless of the implementation level; also IDs that only resolve to a human-readable representation of the object will be included. This list of IDs (~~xxx~~ million as of 23rd of January 2020) will be made accessible through a simple web service (implementation pending).	+	Square one for the Catalogue is the GBIF Index downloaded as a zip file bi- or trimonthly. After importing major elements of the index a SQL Server database, the CETAF IDs of the [http://herbal.rbge.info/md.php?q=implementers partners] of the [https://cetaf.org/cetaf-stable-identifiers Stable Identifiers Implementers Group] will be extracted into the CETAF ID Catalogue (hence the requirement for the institutions to use it as a [http://rs.tdwg.org/dwc/terms/occurrenceID GBIF occurrence ID]). This will happen regardless of the implementation level; also IDs that only resolve to a human-readable representation of the object will be included. This list of IDs (20 million as of 23rd of January 2020) will be made accessible through a simple web service (implementation pending).

	For the institutions supporting Level 2 of the CETAF Identifier system (currently 6 partners), the RDF documents for the identifiers found in the ID Catalogue will be harvested and imported into a triple store ([https://jena.apache.org/ Apache Jena]). This {{abbr\|LOD\|Linked Open Data}} cache will be made accessible through a SPARQL access point (still pending). When using this access point, please keep in mind that harvesting takes place in similar intervals as the ID Catalogue is created from the GBIF Index (roughly every 2-3 months). Also, the number of specimens in the RDF Cache is lower than the number of IDs in the ID Catalogue, since not all institutions provide RDF representations of their specimens.		For the institutions supporting Level 2 of the CETAF Identifier system (currently 6 partners), the RDF documents for the identifiers found in the ID Catalogue will be harvested and imported into a triple store ([https://jena.apache.org/ Apache Jena]). This {{abbr\|LOD\|Linked Open Data}} cache will be made accessible through a SPARQL access point (still pending). When using this access point, please keep in mind that harvesting takes place in similar intervals as the ID Catalogue is created from the GBIF Index (roughly every 2-3 months). Also, the number of specimens in the RDF Cache is lower than the number of IDs in the ID Catalogue, since not all institutions provide RDF representations of their specimens.

Difference between revisions of "CETAF Specimen Catalogue"

Revision as of 14:12, 20 April 2020

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools