En aquel Imperio, el Arte de la Cartografía logró tal Perfección que el Mapa de una sola Provincia ocupaba toda una Ciudad, y el Mapa del Imperio, toda una Provincia. Con el tiempo, estos Mapas Desmesurados no satisficieron y los Colegios de Cartógrafos levantaron un Mapa del Imperio, que tenía el Tamaño del Imperio y coincidía puntualmente con él. Menos Adictas al Estudio de la Cartografía, las Generaciones Siguientes entendieron que ese dilatado Mapa era Inútil y no sin Impiedad lo entregaron a las Inclemencias del Sol y los Inviernos. En los Desiertos del Oeste perduran despedazadas Ruinas del Mapa, habitadas por Animales y por Mendigos; en todo el País no hay otra reliquia de las Disciplinas Geográficas.
[Del rigor en la ciencia, Jorge Luis Borges]

2011-05-16

Metadata Harvesting (OAI-PMH)

Pesquisa de recursos sobre metadata harvesting

[interessa sobretudo a recolha selectiva de registos de metadados, de modo a que um mesmo registo possa ser exposto por diferentes serviços agregadores sem que haja duplicação e/ou inconsistência no conteúdo]

Open Archives Initiative - Protocol for Metadata Harvesting
Data Providers are repositories that expose structured metadata via OAI-PMH.
Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six services that are invoked within HTTP.

Especificação > OAI-PMH Version 2.0 Specification

Resource Item Record(s)
Exemplo de registo:
<header>
  <identifier>oai:arXiv:cs/0112017</identifier>
  <datestamp>2002-02-28</datestamp>
  <setSpec>cs</setSpec>
  <setSpec>math</setSpec>
</header>
<metadata>
...
</metadata>
<about> 
  <provenance
      xmlns="http://www.openarchives.org/OAI/2.0/provenance" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance
      http://www.openarchives.org/OAI/2.0/provenance.xsd">
    <originDescription harvestDate="2002-02-02T14:10:02Z" altered="true">
      <baseURL>http://the.oa.org</baseURL>
      <identifier>oai:r2:klik001</identifier>
      <datestamp>2002-01-01</datestamp>
      <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
    </originDescription>
  </provenance>
</about>

[...]If a record is no longer available then it is said to be deleted[...].
Repositories must declare one of three levels of support for deleted records[...]:
no - the repository does not maintain information about deletions.
persistent - the repository maintains information about deletions with no time limit.

transient - the repository does not guarantee that a list of deletions is maintained persistently or consistently.
If a repository does not keep track of deletions then such records will simply vanish from responses and there will be no way for a harvester to discover deletions through continued incremental harvesting.

[...]
A set is an optional construct for grouping items for the purpose of selective harvesting. Repositories may organize items into sets. Set organization may be flat, i.e. a simple list, or hierarchical. Multiple hierarchies with distinct, independent top-level nodes are allowed.[...]
Selective harvesting allows harvesters to limit harvest requests to portions of the metadata available from a repository.
The OAI-PMH supports selective harvesting with two types of harvesting criteria [...]: datestamps and set membership.

Tutorial > XML Schemas and Support for Multiple Record Formats in OAI-PMH

Implementações > GeoNetwork 2.6.3
Multiple harvesting and hierarchiesCatalogues that provide UUIDs for metadata (for example GeoNetwork and a CSW server) can be harvested several times without having to take care about metadata overlap.
This allows the possibility to perform a thematic search and a metadata belonging to multiple searches is harvested only once and not duplicated.
This mechanism allows the GeoNetwork harvesting type to be combined with other GeoNetwork nodes to perform hierarchical harvesting.
This way a metadata can be harvested from several nodes. For example, consider this scenario:
    Node (A) has created metadata (a)
    Node (B) harvests (a) from (A)
    Node (C) harvests (a) from (B)
    Node (D) harvests from both (A), (B) and (C)
In this scenario, Node (D) will get the same metadata (a) from all 3 nodes (A), (B), (C). The metadata will flow to (D) following 3 different paths but thanks to its UUID only one copy will be stored. When (a) will be changed in (A), a new version will flow to (D) but, thanks to the change date, the copy in (D) will be updated with the most recent version.

[obviamente, para que isto funcione, um registo harvested nunca é editável por quem o recolheu]


Exemplos >

Implementações > ESRI Geoportal 9.3.1
Open Archives Initiative (OAI)
Required Fields
        URL/Host: URL of the server that host the metadata repository or clearinghouse
        OAI Set: name of the set or database from which you want to harvest
        OAI Meta Prefix: prefix of the metadata records stored in the OAI database that you want to harvest
Optional Fields
        Max Documents to Harvest: the maximum number of documents that will be harvested. If left blank, every document in the repository will be harvested, assuming no other criteria have been set
        From/Until Date: date range can be used to harvest metadata records that have been updated or created in a specified period. Specifying only the "from" date, implies an "until" date of today
[aparentemente a especificação de um set é obrigatória nesta implementação]