Daan Broeder, Max-Planck Institute for Psycholinguistics: Hadle System in European Research Infrastructure Projects
Already been involved in infrastructure for several e-research projects.
Reliable references and citations of net accessible resources, particularly in language: audio-visual, lexica, concepts...
Number of resources can be large, especially if disaggregating corpora. (Then again, a persistent identifier for each paragraph is overkill, and has a cost.)
Identifiers are cited, and embedded, and in databases.
CLARIN project: making language resources more available. Aims to create federation of language resources, mediated through persistent identifiers. Preparatory phase 2008-2011, construction up to 2020. Builds on DAM-LR project: unified metadata catalogue, shibboleth federation, Handle system.
For flexibility, DAM-LR minimised amount of sharing required. Developed mover (move data + update identifier), and restore Handle DB from scratch. All data needs to be recoverable from the archives themselves. Found that federation is not for all organisations, does impose an IT burden. Need centralised registration.
Max-Planck Society wants PID system throughout the Max-Planck society, which will also support external German scientific organisations.
Requirements for CLARIN: political independence: European GHR and no single point of failure; wide(r) acceptance of PID scheme (w3c!); support for object part addressing (ISO TC37/SC4 CITER: citation of electronic language resources); secure management of resource copies.
CLARIN will do third party registration for small archives.
Ongoing static from W3C in ISO. Proposes URLified Handles, suggests ARK model: http://hdl.handle.net/hdl:/1039/R5
Part identifiers: just like fragments in URIs: A#z => objectA?part=z, with a standard syntax for "z" for the given data type, exploiting existing standards.
Replicas: federated edit access to handle record by old and new managers. Known issue of access by multiple parties, trust. Could also have indirect Handles, i.e. aliasing. Not everything supports aliases well, doubtful status of the new alias for citation.
Value-add services: document integrity; collection registration service (single PID for collection, with aggregation map à la ORE); citation information service (acknowledgements, preferred citation format to be included in citation); lost resource detective (trawl the logs, the web, etc to find where the resource has ended up, including tracking provenance history of who last deposited).
What have I learned from the Handle workshop?
- Handle type registry is coming. Complete with schema (which will need work) and policies (which look to be way too laissez faire)
- The Nijmegen folks are gratifyingly coming to similar conclusions about things as us (e.g. REST resolver queries)
- That European digital library is going to be huge... if it can hold together.
- Selling the entire Max-Planck Gesellschaft on using persistent identifiers—that's huge too.
- Scholars blog. And want credit for blogging.
- There are ISO standards for disaggregating texts, among other media. (I can gets standardz?) And Nijmegen looks kindly on ORE.
- The ADL-R is being released in a genericised form: DO registry.
- Handle is being integrated into Grid services (but we already halfway knew that)
- OpenHandle is still a good thing
- XMP, for embedding metadata into digital objects, is now getting currency, and can be used to brand objects with their identifiers (amongst other things) and update that metadata with online reference (as I identified in a use case last year, methinks)
- W3C continues to be all W3C-ish about non-HTTP URIs. People have not given up on registering hdl: schema.