Difference between revisions of "User:Andreas Plank/Import issues with CETAF identifiers"
m (→herbarium.bgbm.org (BGBM): highlight="2") |
m (→id.luomus.fi (LUOMUS): +section) |
||
Line 4: | Line 4: | ||
| style="width:180px;vertical-align:top;" | Screenshot of the Firefox RESTED plugin (steps to retrieve an RDF data source) | | style="width:180px;vertical-align:top;" | Screenshot of the Firefox RESTED plugin (steps to retrieve an RDF data source) | ||
|} | |} | ||
− | '''Note:''' Unresolved or pending issues are on top and issues that are done get to the end. To check for RDF in your browser you can (1) use the [http://herbal.rbge.info CETAF Specimen URI Tester (http://herbal.rbge.info)] or use a plugin in your browser, e.g. [https://addons.mozilla.org/de/firefox/addon/rested/ RESTED Client] and then adding Header <code>Accept: application/rdf+xml</code> | + | '''Note:''' Unresolved or pending issues are on top and issues that are done get to the end. To check for RDF in your browser you can (1) use the [http://herbal.rbge.info CETAF Specimen URI Tester (http://herbal.rbge.info)] or use a plugin in your browser, e.g. [https://addons.mozilla.org/de/firefox/addon/rested/ RESTED Client] and then adding Header <code>Accept: application/rdf+xml</code> (see example aside) |
---- | ---- | ||
Line 13: | Line 13: | ||
({{Tobedone|Pending (minor issue does not block)}}) Requesting “<code>Content-Type: application/rdf+xml</code>” results in 404 (not found) instead of getting RDF (see https://github.com/NaturalHistoryMuseum/ckanext-nhm/issues/458) --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:06, 18 February 2020 (CET) | ({{Tobedone|Pending (minor issue does not block)}}) Requesting “<code>Content-Type: application/rdf+xml</code>” results in 404 (not found) instead of getting RDF (see https://github.com/NaturalHistoryMuseum/ckanext-nhm/issues/458) --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:06, 18 February 2020 (CET) | ||
+ | <blockquote> | ||
* minor issue not relevant because header “<code>Content-Type: application/rdf+xml</code>” is meant for the (returned) resource, not the request --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 10:40, 20 February 2020 (CET) | * minor issue not relevant because header “<code>Content-Type: application/rdf+xml</code>” is meant for the (returned) resource, not the request --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 10:40, 20 February 2020 (CET) | ||
+ | </blockquote> | ||
+ | |||
+ | == id.luomus.fi ({{abbr|LUOMUS}}) == | ||
+ | |||
+ | ({{Tobedone}}) The requested RDF does not describe the requested {{abbr|CETAF-ID}} <code><nowiki>http://id.luomus.fi/GL.749</nowiki></code> itself, the ID “hangs somewhat in the air”: | ||
+ | <blockquote> | ||
+ | # http://id.luomus.fi/GL.749 gets redirected to http://id.luomus.fi/GL.749?format=RDFXML and | ||
+ | # by analysing the RDF via Apache Jena’s <code>rdfparse</code> it reveals that it describes <syntaxhighlight lang="text" inline><http://id.luomus.fi/GL.749?format=RDFXML> <http://purl.org/dc/terms/subject> <http://id.luomus.fi/GL.749></syntaxhighlight> just to be related, but | ||
+ | # <code><nowiki>http://id.luomus.fi/GL.749</nowiki></code> itself has no related description (<code>rdf:Description</code>) but there are two descriptions <code><nowiki>http://tun.fi/MY.275076</nowiki></code> and <code><nowiki>http://tun.fi/MY.881682</nowiki></code> which do not relate to <code><nowiki>http://id.luomus.fi/GL.749</nowiki></code>. So {{abbr|CETAF-ID}} <code><nowiki>http://id.luomus.fi/GL.749</nowiki></code> “hangs somewhat in the air” because it is not described. | ||
+ | --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 12:10, 20 February 2020 (CET) | ||
+ | |||
+ | See perhaps the [[CETAF Specimen Preview Profile (CSPP)#example_CSPP-compliant_RDF|example of CETAF Specimen Preview Profile (CSPP)]] | ||
+ | </blockquote> | ||
== specimens.kew.org ({{abbr|RBGK}}) == | == specimens.kew.org ({{abbr|RBGK}}) == |
Revision as of 12:11, 20 February 2020
Screenshot of the Firefox RESTED plugin (steps to retrieve an RDF data source) |
Note: Unresolved or pending issues are on top and issues that are done get to the end. To check for RDF in your browser you can (1) use the CETAF Specimen URI Tester (http://herbal.rbge.info) or use a plugin in your browser, e.g. RESTED Client and then adding Header Accept: application/rdf+xml
(see example aside)
Contents
data.nhm.ac.uk (NHM)
( Pending (minor issue does not block)) Requesting “Content-Type: application/rdf+xml
” results in 404 (not found) instead of getting RDF (see https://github.com/NaturalHistoryMuseum/ckanext-nhm/issues/458) --Andreas Plank (talk) 14:06, 18 February 2020 (CET)
- minor issue not relevant because header “
Content-Type: application/rdf+xml
” is meant for the (returned) resource, not the request --Andreas Plank (talk) 10:40, 20 February 2020 (CET)
id.luomus.fi (LUOMUS)
( Pending) The requested RDF does not describe the requested CETAF-ID http://id.luomus.fi/GL.749
itself, the ID “hangs somewhat in the air”:
- http://id.luomus.fi/GL.749 gets redirected to http://id.luomus.fi/GL.749?format=RDFXML and
- by analysing the RDF via Apache Jena’s
rdfparse
it reveals that it describes<http://id.luomus.fi/GL.749?format=RDFXML> <http://purl.org/dc/terms/subject> <http://id.luomus.fi/GL.749>
just to be related, buthttp://id.luomus.fi/GL.749
itself has no related description (rdf:Description
) but there are two descriptionshttp://tun.fi/MY.275076
andhttp://tun.fi/MY.881682
which do not relate tohttp://id.luomus.fi/GL.749
. So CETAF-IDhttp://id.luomus.fi/GL.749
“hangs somewhat in the air” because it is not described.--Andreas Plank (talk) 12:10, 20 February 2020 (CET)
See perhaps the example of CETAF Specimen Preview Profile (CSPP)
specimens.kew.org (RBGK)
( Pending) Requested RDF is instead HTML but RDF --Andreas Plank (talk) 14:32, 18 February 2020 (CET)
For instance under Linux:
wget --header='Accept: application/rdf+xml' --content-on-error --output-document="specimens.kew.org⁄herbarium⁄K001116483.rdf" "http://specimens.kew.org/herbarium/K001116483" file specimens.kew.org⁄herbarium⁄K001116483.rdf # specimens.kew.org⁄herbarium⁄K001116483.rdf: HTML document, ASCII text, with very long lines, with CRLF, LF line terminators
col.smns-bw.org (SMNS)
( Pending) Requested RDF is instead an HTML fragment but RDF.--Andreas Plank (talk) 14:38, 18 February 2020 (CET)
For instance under Linux:
wget --header='Accept: application/rdf+xml' --content-on-error --output-document="col.smns-bw.org⁄object⁄S10000227722006.rdf" "http://col.smns-bw.org/object/S10000227722006" file col.smns-bw.org⁄object⁄S10000227722006.rdf # col.smns-bw.org⁄object⁄S10000227722006.rdf: HTML document, ISO-8859 text, with very long lines, with CRLF line terminators
herbarium.bgbm.org (BGBM)
( Done) In some RDF files are invalid URI entries i.e. there is a tab/space character in the URI in owl:sameAs
and this would break the whole import of data. The error log of triple store loader (tdbloader2) shows something like:
Bad URI: < http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/a86596ea-6f4d-4b97-bf6f-8d492c0fc8b2> Code: 0/ILLEGAL_CHARACTER in SCHEME: The character violates the grammar rules for URIs/IRIs. ERROR Bad character in IRI (space): <[space]...>… see for instance in line 63:
62 <rdf:Description rdf:about="http://www.wikidata.org/entity/Q6382619"> 63 <owl:sameAs rdf:resource=" http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/a86596ea-6f4d-4b97-bf6f-8d492c0fc8b2" /> 64 <owl:sameAs rdf:resource="http://viaf.org/viaf/233473288" /> 65 </rdf:Description>The following objects were detected:
- http://herbarium.bgbm.org/data/rdf/B100000580 --Andreas Plank (talk) 16:21, 30 January 2020 (CET) Done --Andreas Plank (talk) 11:45, 3 February 2020 (CET)
- http://herbarium.bgbm.org/data/rdf/B100000503 --Andreas Plank (talk) 16:21, 30 January 2020 (CET) Done --Andreas Plank (talk) 11:45, 3 February 2020 (CET)
- http://herbarium.bgbm.org/data/rdf/B100000627 --Andreas Plank (talk) 16:21, 30 January 2020 (CET) Done --Andreas Plank (talk) 11:45, 3 February 2020 (CET)