Difference between revisions of "User:Andreas Plank/Import issues with CETAF identifiers"

From CETAF Identifiers Wiki
Jump to: navigation, search
m (+example of retrieving RDF)
m (layout for better reading)
Line 16: Line 16:
 
== specimens.kew.org ({{abbr|RBGK}}) ==
 
== specimens.kew.org ({{abbr|RBGK}}) ==
  
({{Tobedone}}) Requested RDF is instead HTML, e.g. under Linux:
+
({{Tobedone}}) Requested RDF is instead HTML --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:32, 18 February 2020 (CET)
 +
 
 +
<blockquote>
 +
For instance under Linux:
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 
wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="specimens.kew.org⁄herbarium⁄K001116483.rdf" "http://specimens.kew.org/herbarium/K001116483"
 
wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="specimens.kew.org⁄herbarium⁄K001116483.rdf" "http://specimens.kew.org/herbarium/K001116483"
Line 22: Line 25:
 
# specimens.kew.org⁄herbarium⁄K001116483.rdf: HTML document, ASCII text, with very long lines, with CRLF, LF line terminators
 
# specimens.kew.org⁄herbarium⁄K001116483.rdf: HTML document, ASCII text, with very long lines, with CRLF, LF line terminators
 
</syntaxhighlight>
 
</syntaxhighlight>
--[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:32, 18 February 2020 (CET)
+
</blockquote>
  
 
== col.smns-bw.org ({{abbr|SMNS}}) ==
 
== col.smns-bw.org ({{abbr|SMNS}}) ==
  
({{Tobedone}}) Requested RDF is instead an HTML fragment, e.g. under Linux:
+
({{Tobedone}}) Requested RDF is instead an HTML fragment.--[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:38, 18 February 2020 (CET)
 +
 
 +
<blockquote>
 +
For instance under Linux:
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 
wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="col.smns-bw.org⁄object⁄S10000227722006.rdf" "http://col.smns-bw.org/object/S10000227722006"
 
wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="col.smns-bw.org⁄object⁄S10000227722006.rdf" "http://col.smns-bw.org/object/S10000227722006"
Line 32: Line 38:
 
# col.smns-bw.org⁄object⁄S10000227722006.rdf: HTML document, ISO-8859 text, with very long lines, with CRLF line terminators
 
# col.smns-bw.org⁄object⁄S10000227722006.rdf: HTML document, ISO-8859 text, with very long lines, with CRLF line terminators
 
</syntaxhighlight>
 
</syntaxhighlight>
--[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:38, 18 February 2020 (CET)
+
</blockquote>
  
 
== herbarium.bgbm.org ({{abbr|BGBM}}) ==
 
== herbarium.bgbm.org ({{abbr|BGBM}}) ==
  
 
({{done}}) In some RDF files are invalid URI entries i.e. there is a tab/space character in the URI in <code>owl:sameAs</code> and this would break the whole import of data. The error log of triple store loader (tdbloader2) shows something like:
 
({{done}}) In some RDF files are invalid URI entries i.e. there is a tab/space character in the URI in <code>owl:sameAs</code> and this would break the whole import of data. The error log of triple store loader (tdbloader2) shows something like:
Bad URI: < http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/a86596ea-6f4d-4b97-bf6f-8d492c0fc8b2> Code: 0/ILLEGAL_CHARACTER in SCHEME: The character violates the grammar rules for URIs/IRIs. ERROR Bad character in IRI (space): <[space]...>
+
<blockquote>
 +
<pre>Bad URI: < http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/a86596ea-6f4d-4b97-bf6f-8d492c0fc8b2> Code: 0/ILLEGAL_CHARACTER in SCHEME: The character violates the grammar rules for URIs/IRIs. ERROR Bad character in IRI (space): <[space]...></pre>
 
… see for instance in line 63:
 
… see for instance in line 63:
 
<syntaxhighlight lang="xml"  line  start="62" highlight="63">
 
<syntaxhighlight lang="xml"  line  start="62" highlight="63">
Line 49: Line 56:
 
* http://herbarium.bgbm.org/data/rdf/B100000503 --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 16:21, 30 January 2020 (CET) {{done}} --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 11:45, 3 February 2020 (CET)
 
* http://herbarium.bgbm.org/data/rdf/B100000503 --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 16:21, 30 January 2020 (CET) {{done}} --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 11:45, 3 February 2020 (CET)
 
* http://herbarium.bgbm.org/data/rdf/B100000627 --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 16:21, 30 January 2020 (CET) {{done}} --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 11:45, 3 February 2020 (CET)
 
* http://herbarium.bgbm.org/data/rdf/B100000627 --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 16:21, 30 January 2020 (CET) {{done}} --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 11:45, 3 February 2020 (CET)
 +
</blockquote>

Revision as of 15:07, 18 February 2020


Screenshot Firefox Plugin RESTED get an RDF-resource (20200218).png
Screenshot of the Firefox RESTED plugin (steps to retrieve an RDF data source)

Note: Unresolved or pending issues are on top and issues that are done get to the end. To check for RDF in your browser you can (1) use the CETAF Specimen URI Tester (http://herbal.rbge.info) or use a plugin in your browser, e.g. RESTED Client and then adding Header Accept: application/rdf+xml and/or Content-Type: application/rdf+xml (see example aside)



data.nhm.ac.uk (NHM)

(Work in progress: pending Pending) Requesting “Content-Type: application/rdf+xml” results in 404 (not found) instead of getting RDF (see https://github.com/NaturalHistoryMuseum/ckanext-nhm/issues/458) --Andreas Plank (talk) 14:06, 18 February 2020 (CET)

specimens.kew.org (RBGK)

(Work in progress: pending Pending) Requested RDF is instead HTML --Andreas Plank (talk) 14:32, 18 February 2020 (CET)

For instance under Linux:

wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="specimens.kew.org⁄herbarium⁄K001116483.rdf" "http://specimens.kew.org/herbarium/K001116483"
file specimens.kew.org⁄herbarium⁄K001116483.rdf 
# specimens.kew.org⁄herbarium⁄K001116483.rdf: HTML document, ASCII text, with very long lines, with CRLF, LF line terminators

col.smns-bw.org (SMNS)

(Work in progress: pending Pending) Requested RDF is instead an HTML fragment.--Andreas Plank (talk) 14:38, 18 February 2020 (CET)

For instance under Linux:

wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="col.smns-bw.org⁄object⁄S10000227722006.rdf" "http://col.smns-bw.org/object/S10000227722006"
file col.smns-bw.org⁄object⁄S10000227722006.rdf
# col.smns-bw.org⁄object⁄S10000227722006.rdf: HTML document, ISO-8859 text, with very long lines, with CRLF line terminators

herbarium.bgbm.org (BGBM)

( Done) In some RDF files are invalid URI entries i.e. there is a tab/space character in the URI in owl:sameAs and this would break the whole import of data. The error log of triple store loader (tdbloader2) shows something like:

Bad URI: < http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/a86596ea-6f4d-4b97-bf6f-8d492c0fc8b2> Code: 0/ILLEGAL_CHARACTER in SCHEME: The character violates the grammar rules for URIs/IRIs. ERROR Bad character in IRI (space): <[space]...>

… see for instance in line 63:

62 <rdf:Description rdf:about="http://www.wikidata.org/entity/Q6382619">
63                     <owl:sameAs rdf:resource="	http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/a86596ea-6f4d-4b97-bf6f-8d492c0fc8b2" />
64                 <owl:sameAs rdf:resource="http://viaf.org/viaf/233473288" />
65           </rdf:Description>

The following objects were detected: