Difference between revisions of "IDs and LOD Discussion"

From CETAF Identifiers Wiki
Jump to: navigation, search
(CETAF collection data index)
Line 25: Line 25:
  
 
== CETAF collection data index ==
 
== CETAF collection data index ==
[BGBM]: As a first step, we would like to create a raw list of available CETAF stable IDs. If correctly mapped, all IDs should be accessible via the GBIF API. Based on this list we would then try to harvest the RDF metadata directly from CETAF institutions.
+
[BGBM]: As a first step, we created a list of CETAF identifiers found in GBIF.
 +
 
 +
As of October 17th, 2017, the 13 institutions listed on http://cetaf.org/cetaf-stable-identifiers shared
 +
* 33,177,510 occurrences with GBIF, of which
 +
* 30,679,787 used a GUID (http://rs.tdwg.org/dwc/terms/occurrenceID), of which
 +
* 22,040,872 are HTTP URIs starting with ''http://'',
 +
* 21,812,600 URIs conform with the base URLs listed on http://cetaf.org/cetaf-stable-identifiers.
  
 
== The situation at the BR herbarium, Meise==
 
== The situation at the BR herbarium, Meise==

Revision as of 15:34, 17 October 2017

In 2016 and 2017, the ISTC decided that improving LOD capabilities of CETAF Stable Identifiers for collection objects should become a priority. This involves primarily

  • activities for improving links from collection metadata to external resources and concepts and
  • implementation of a working CETAF collection data index prototype as a basis for advanced inference mechanisms.

Ideas, discussions, and outcomes linked to these targets will be documented on this page. Please feel free to add your thoughts / comments / results below. More information about the CETAF identifier initiative is available on the main wikipage.

Improving links to external resources

[BGBM]: we started to discuss how to enrich our (rdf) metadata and concluded that we will start with looking closer at collectors. Our first step will be to export collector names and collector IDs from our herbarium management system (JACQ) and sort them by frequency of use. We will then setup a spreadsheet with columns for ...

  • collector name
  • local collector ID BGBM
  • link to example specimen(s)
  • external resource: wikidata
  • external resource: HUH
  • external resource VIAF
  • problem flag

... and ask a student assistant to search for collectors in wikidata / HUH / VIAF and enter the (URI) identifiers.

By starting with frequent collectors we hope to be able to achieve a wide coverage with reasonable efforts. It would be great if other herbaria could also start to work into this spreadsheet. In this case we would probably just have to add more fields for local collector IDs.

The same workflow could be implemented for geographic features.

CETAF collection data index

[BGBM]: As a first step, we created a list of CETAF identifiers found in GBIF.

As of October 17th, 2017, the 13 institutions listed on http://cetaf.org/cetaf-stable-identifiers shared

The situation at the BR herbarium, Meise

We have manually linked our top 900 collectors to the HUH. This was done manually to ensure that biographical details matched in our database and in HUH. In the process we identified about 230 collectors that were not at HUH. We have since given details of these collectors to HUH so that they can improve their data and we can complete the link for these additional collectors. Currently we are digitising a very large numbers of specimens (>1,000,000) so the number of collectors will increase and their frequencies will change. Therefore, we will conduct more linking once these data are available.

Our new specimen portal [1] has stable identifiers and has a machine readable RDF version of each specimen. Within this RDF is the link to the HUH database.

... <rdf:Description rdf:about="Glaziou A."> <owl:sameAs rdf:resource="http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/0832e613-7879-4f72-89f9-78e55c6ac1a9"/> <dwc:recordedBy>Glaziou A.</dwc:recordedBy> </rdf:Description> ...

[Anton Güntsch (BGBM)]: Once we have completed our top (say) 500 collectors we would be very interested in organising a shared list of collectors with links to external ressources. For example, our list will have the HUH ID and also IDs to VIAF and WikiData. Meise could then easily retrieve VIAF and WikiData IDs using the HUH IDs.