Pesquisa de recursos sobre metadata harvesting
[interessa sobretudo a recolha selectiva de registos de metadados, de modo a que um mesmo registo possa ser exposto por diferentes serviços agregadores sem que haja duplicação e/ou inconsistência no conteúdo]
Open Archives Initiative - Protocol for Metadata Harvesting
Data Providers are repositories that expose structured metadata via OAI-PMH.
Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six services that are invoked within HTTP.
Especificação >
OAI-PMH Version 2.0 Specification
Resource → Item → Record(s)
<header>
<identifier>oai:arXiv:cs/0112017</identifier>
<datestamp>2002-02-28</datestamp>
<setSpec>cs</setSpec>
<setSpec>math</setSpec>
</header>
<metadata>
...
</metadata>
<about>
<provenance
xmlns="http://www.openarchives.org/OAI/2.0/provenance"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance
http://www.openarchives.org/OAI/2.0/provenance.xsd">
<originDescription harvestDate="2002-02-02T14:10:02Z" altered="true">
<baseURL>http://the.oa.org</baseURL>
<identifier>oai:r2:klik001</identifier>
<datestamp>2002-01-01</datestamp>
<metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</originDescription>
</provenance>
</about>
[...]If a record is no longer available then it is said to be deleted[...].
Repositories must declare one of three levels of support for deleted records[...]:
no - the repository does not maintain information about deletions.
persistent - the repository maintains information about deletions with no time limit.
transient - the repository does not guarantee that a list of deletions is maintained persistently or consistently.
If a repository does not keep track of deletions then such records will simply vanish from responses and there will be no way for a harvester to discover deletions through continued incremental harvesting.
[...]
A set is an optional construct for grouping items for the purpose of selective harvesting. Repositories may organize items into sets. Set organization may be flat, i.e. a simple list, or hierarchical. Multiple hierarchies with distinct, independent top-level nodes are allowed.[...]
Selective harvesting allows harvesters to limit harvest requests to portions of the metadata available from a repository.
The OAI-PMH supports selective harvesting with two types of harvesting criteria [...]: datestamps and set membership.
Tutorial >
XML Schemas and Support for Multiple Record Formats in OAI-PMH
Implementações >
GeoNetwork 2.6.3
Multiple harvesting and hierarchiesCatalogues that provide UUIDs for metadata (for example GeoNetwork and a CSW server) can be harvested several times without having to take care about metadata overlap.
This allows the possibility to perform a thematic search and a metadata belonging to multiple searches is harvested only once and not duplicated.
This mechanism allows the GeoNetwork harvesting type to be combined with other GeoNetwork nodes to perform hierarchical harvesting.
This way a metadata can be harvested from several nodes. For example, consider this scenario:
Node (A) has created metadata (a)
Node (B) harvests (a) from (A)
Node (C) harvests (a) from (B)
Node (D) harvests from both (A), (B) and (C)
In this scenario, Node (D) will get the same metadata (a) from all 3 nodes (A), (B), (C). The metadata will flow to (D) following 3 different paths but thanks to its UUID only one copy will be stored. When (a) will be changed in (A), a new version will flow to (D) but, thanks to the change date, the copy in (D) will be updated with the most recent version.
[obviamente, para que isto funcione, um registo
harvested nunca é editável por quem o recolheu]
Exemplos >
Implementações >
ESRI Geoportal 9.3.1
Open Archives Initiative (OAI)
Required Fields
URL/Host: URL of the server that host the metadata repository or clearinghouse
OAI Set: name of the set or database from which you want to harvest
OAI Meta Prefix: prefix of the metadata records stored in the OAI database that you want to harvest
Optional Fields
Max Documents to Harvest: the maximum number of documents that will be harvested. If left blank, every document in the repository will be harvested, assuming no other criteria have been set
From/Until Date: date range can be used to harvest metadata records that have been updated or created in a specified period. Specifying only the "from" date, implies an "until" date of today
[aparentemente a especificação de um set é obrigatória nesta implementação]