2009-03-18

UKOLN International Repository Workshop: Identifier Interoperability

[EDIT: Maurice Vanderfeesten has a fuller summary of the outcomes.]

First Report:


  • Many resonances with what was already said in other streams: support for scholarly cycle, recognition of range of solutions, disagreement on scope, needing to work with more than traditional repositories.
  • Identifying: objects (not just data), institutions, and people in limited roles.
  • Will model relations between identifiers; there are both implicit and explicit information models involved.
  • Temporal change needs to be modelled; there are lots of challenges.
  • Not trying to build the one identifier system, but loose coupling of identifier services with already extant identifier systems.
  • Start with small sets of functionality and then expand.
  • Identifiers are created for defined periods and purposes, based on distinguishing attributes of things.


Second Report:


  • We can't avoid the "more research needed" phase of work: need to work out workflows and use cases to support the identifier services, though the infrastructure will be invisible to some users.
  • Need rapid prototyping of services, not waterfall.
  • The mindmaps provided by the workshop organisers of parties involved in the repository space [will be published soon] are useful, and need to be kept up to date through the lifetime of project.
  • There may not be much to do internationally for object identification, since repositories are doing this already; but we likely need identifiers for repositories.
  • Author identifiers: repositories should not be acting as naming authorities, but import that authority from outside.
  • There are different levels of trust for naming authorities; assertions about authors change across time.
  • An interoperability service will allow author to bind multiple identities together, and give authors the control to prevent their private identities being included in with their public personas.



Third Report:


  • The group has been pragmatic in its reduction of scope.
  • There will be identifiers for: Organisations, repositories, people, objects.
  • Identifiers are not names: we not building a name registry, and name registries have their own distinct authority.
  • Organisations:
    • Identifiers for these should be built on top of existing systems (which is a general principle for this work).
    • There could usefully be a collection of organisation identifiers, maintained as a federated system, and including temporal change in its model.
    • The organisation registry can be tackled by geographical region, and start on existing lists, e.g. DNS.

  • Repositories:
    • There shall be a registry for repositories. There shall be rules and vetting for getting on the registry, sanity checks. Here too there are temporal concerns to model: repositories come into and out of existence.
    • The registry shall be a self-populating system, building on existing systems like OpenDOAR. It should also offer depopulation (a repository is pinged, and found no longer to be live.)
    • There is a many-to-many relation of repositories to institutions.
    • The registry shall not be restricted to open access repositories.

  • Objects:
    • We are not proposing to do a new identifier scheme.
    • We are avoiding detailed information models such as FRBR for now.
    • We propose to create do equivalence service at FRBR Manifestation level between two identifiers: e.g. a query on whether this ARK and this Handle are pointing to the same bitstream of data, though possibly at different locations.
    • Later on could build a Same FRBR Expression service (do these two identifiers point to digital objects with the same content).
    • The equivalence service would be identifier schema independent [and would likely be realised in RDF].

  • People:
    • A people identification service could be federated or central.
    • People have multiple identities: we would offer an equivalence service and a non-equivalence service between multiple identities.
    • The non-equivalence service is needed because this is not a closed-world set: people may assert that two identities are the same, or are not the same.
    • The service would rely on self-assertions by the user being identified.
    • The user would select identities, out of a possibly prepopulated list.
    • People may want to leave identities out of their assertions of equivalence (i.e. keep them private).

No comments: