2009-03-18

URN NBN Resolver Demonstration

Web sites:

Demonstrated by Maurice Vanderfeesten.

Actually his very cool Prezi presentation will be more cogent than my notes: URN NBN Resolver Presentation. [EDIT: Moreover, he included his own notes in his discussion of the identifier workshop session.]

A few notes to supplement this:


  • The system uses URNs based on National Library Numbers (URN-NBN) as their persistent identifiers.
  • So it's a well-established bibliographic identification scheme, which can certainly be expanded to the research repository world. (The German National Library already covers research data.)
  • The pilot got coded start of 2009.
  • They are using John Kunze's Name-To-Thing resolver as their HTTP URI infrastructure for making their URNs resolvable.
  • Tim Berners-Lee might be surprised to see his Linked Data advocacy brought up in this presentation in the context of URNs. But as long as things can also be expressed as HTTP URIs, it does not matter.
    The blood on the blade of the W3C TAG URN finding is still fresh, I know.

  • Lots of EU countries are queuing up to use this as persistent identifier infrastructure.
  • The Firefox plugin works on resolving these URNs with predictable smoothness. :-)
  • They are working through what their granularity of referents will be, and what the long term sustainability expectations are for their components (the persistence guarantees, in the terms of the PILIN project)
  • They would like to update RFC 2141 on URNs, and already have in place RFC 3188 on NBNs.
  • They now need to convince the community of the urgency and benefits of persistent identifiers and of this particular approach, and to get community buy-in.

UKOLN International Repository Workshop: Identifier Interoperability

[EDIT: Maurice Vanderfeesten has a fuller summary of the outcomes.]

First Report:


  • Many resonances with what was already said in other streams: support for scholarly cycle, recognition of range of solutions, disagreement on scope, needing to work with more than traditional repositories.
  • Identifying: objects (not just data), institutions, and people in limited roles.
  • Will model relations between identifiers; there are both implicit and explicit information models involved.
  • Temporal change needs to be modelled; there are lots of challenges.
  • Not trying to build the one identifier system, but loose coupling of identifier services with already extant identifier systems.
  • Start with small sets of functionality and then expand.
  • Identifiers are created for defined periods and purposes, based on distinguishing attributes of things.


Second Report:


  • We can't avoid the "more research needed" phase of work: need to work out workflows and use cases to support the identifier services, though the infrastructure will be invisible to some users.
  • Need rapid prototyping of services, not waterfall.
  • The mindmaps provided by the workshop organisers of parties involved in the repository space [will be published soon] are useful, and need to be kept up to date through the lifetime of project.
  • There may not be much to do internationally for object identification, since repositories are doing this already; but we likely need identifiers for repositories.
  • Author identifiers: repositories should not be acting as naming authorities, but import that authority from outside.
  • There are different levels of trust for naming authorities; assertions about authors change across time.
  • An interoperability service will allow author to bind multiple identities together, and give authors the control to prevent their private identities being included in with their public personas.



Third Report:


  • The group has been pragmatic in its reduction of scope.
  • There will be identifiers for: Organisations, repositories, people, objects.
  • Identifiers are not names: we not building a name registry, and name registries have their own distinct authority.
  • Organisations:
    • Identifiers for these should be built on top of existing systems (which is a general principle for this work).
    • There could usefully be a collection of organisation identifiers, maintained as a federated system, and including temporal change in its model.
    • The organisation registry can be tackled by geographical region, and start on existing lists, e.g. DNS.

  • Repositories:
    • There shall be a registry for repositories. There shall be rules and vetting for getting on the registry, sanity checks. Here too there are temporal concerns to model: repositories come into and out of existence.
    • The registry shall be a self-populating system, building on existing systems like OpenDOAR. It should also offer depopulation (a repository is pinged, and found no longer to be live.)
    • There is a many-to-many relation of repositories to institutions.
    • The registry shall not be restricted to open access repositories.

  • Objects:
    • We are not proposing to do a new identifier scheme.
    • We are avoiding detailed information models such as FRBR for now.
    • We propose to create do equivalence service at FRBR Manifestation level between two identifiers: e.g. a query on whether this ARK and this Handle are pointing to the same bitstream of data, though possibly at different locations.
    • Later on could build a Same FRBR Expression service (do these two identifiers point to digital objects with the same content).
    • The equivalence service would be identifier schema independent [and would likely be realised in RDF].

  • People:
    • A people identification service could be federated or central.
    • People have multiple identities: we would offer an equivalence service and a non-equivalence service between multiple identities.
    • The non-equivalence service is needed because this is not a closed-world set: people may assert that two identities are the same, or are not the same.
    • The service would rely on self-assertions by the user being identified.
    • The user would select identities, out of a possibly prepopulated list.
    • People may want to leave identities out of their assertions of equivalence (i.e. keep them private).

UKOLN International Repository Workshop: Repository Organisation

First Report:


  • Aim: to support repository concepts with a common purpose.
  • To support the professional peer group, with bottom-up demand.
  • To support interoperability, assuring data quality.
  • To formulate guidelines, supporting national cooperation, to help recruit new repositories, to enable international interoperability.
  • The activity can be compared to the international collaboration behind Dublin Core.
  • The confederation would have a strategic role, providing support outside national boundaries to repository development.
  • It would provide a locus for interaction with other communities: researchers, publishers.
  • It will be driven by improving the scholarly process, and not just by repositories as an aim in themselves.


Second Report:


  • The group needed to define the nature of the organisation to work towards: finding a common point of departure was difficult.
  • Need to articulate benefits to stakeholders:
    • a forum for information exchange,
    • promoting repository management as a profession,
    • reflecting community needs,
    • channelling demands for new software.

  • The relations underlying the confederation are in place already, but the types of relations will be worked out tomorrow. The group has to establish evidence of need for the confederation.
  • The roles of the organisation will be worked through tomorrow: they will involve service to repositories and to researchers.
  • The workshop discussants have split into an advisory group, an investigatory group, and visionary group.


Third Report:


  • The organisation goal is to enhance the scholarly process through a federation of open access repositories.
  • They will approach funding agencies. The organisation must be independent, bottom-up, funded through membership.
  • Sustainability, political authority, visibility.
  • The organisation's core concepts will be formed around stakeholder needs and activities. These are varied; they need:
    • clarity of roles,
    • strong governance,
    • network of expertise,
    • carry through of interopability issues;
    • help in setting up repositories and repository advocacy;
    • certification & quality assurance.

  • Groups identified the contributions they could bring: money, expertise, ambassadors, suitable workflows.
  • Deliverables & outcomes: e.g. hold meetings, sessions in conferences, make visible the repository manager profession; lobbying, websites, potentially helpdesk.
  • Governance model: organisational membership, partnership with software providers.
  • Timeframe: proof of concept to circulate April, formal model of confederation May, letter of request of participation June.

UKOLN International Repository Workshop: Repository Handshake

First Report:


  • An attempt to rationalise the service requirements: working on PUT, not GET or KEEP
  • The aim is to populate repositories; support authors & friends (funders or institutions) making their research material available through open access
  • Have ingest support services that repositories will use downstream.
  • Focus on research papers, although that may scope more widely.
  • Balance of priorities between improving existing workflows vs. recruiting content from new depositors.
  • What information to be collected at point of ingest? —question unresolved. The group is scoping potential conflicts.
  • Machine-to-machine interoperability vs. computer-assisted human-mediated deposit: these form a continuum.
  • Workflow agreed on as the target of the group's work; the reification of "workflow" took three directions: e-research workflow; e-publication workflow; repository management.


Second Report:


  • Over the past ten years people's expectations have not been realised.
  • People have had stabs at different services.
  • Need to identify what is the sweet spot between useful services for the community [lots of metadata on ingest], and not imposing difficult requirements on author [little metadata on ingest].
  • [I lost track here I'm afraid.]


Third Report:


  • Deposit is the focus of this activity.
  • Handshake has two parts: PUT from the client, and BEG from the server. [i.e. recruit content].
  • Use cases: these are deposit opportunities, and range outside the boundary of the repository. Repositories communicating with each other is only one such use case.
  • Key words: more, better quality [of metadata], easier [remove obstacles to deposit], rewarding [for depositor]. Handshake must involve social contract of reward.
  • Plan, multiphase.

    • Phase 1: rapid engagement internationally. Some nations have national leverage, but not all do. A international framework is still needed.
    • Eight deposit opportunities have been identiified; 2-3 to focus on in workplan Phase 1, over 6 months. For example:
      • Multi authored paper, several institutions and countries—what does deposit look like, and how does it become once-only? (Will not be rich but minimally sufficient)
      • Use institutionally motivated deposit;
      • Communication between institutional and discipline repositories;
      • Publisher of journal offers open access service to author.

    • Seek real life description of those focus use cases, and exemplars already in use on the ground.
    • Output of this focussed activity is descriptions of what practice is, not code or prototypes.
    • Then gap analysis.
    • Overall 2-3 year time horizon, but not planning out so far yet.

UKOLN International Repository Workshop: Citation Services

First Report:


  • Currently small number of commercial service providers is dominant in this field. Are we evolving repository services [to accommodate the existing systems], or revolutionising them?

  • Since citations drive national funding, systems need to be trusted auditable and open.

  • Citations relate authors and ideas, and help connect concepts together; they provide literature ranking, and larger scale analytic services across literature.

  • International coordination: existing infrastructure of loosely coupled repositories can be foundation of robust scalable solution.


Second Report:


  • The group is producing no large plan and manifesto, but is going back to basics.

  • "Handshake" meant different things to different people; there are limitations to the metaphor.

  • There will be group activity, with two foci: business and technological.

  • Recruitment of content needs to happen outside repository established space, including through desktop bibliographic tools such as Zotero.



Third Report:


  • There is a huge variety of presentations of citations, and there are partial solutions specific to communities.

  • Model how to deal with citations: Isolate references from papers, and then extract reference data, and interpret it, from varying citation schemes.

  • For repository to be active in this without overconsuming resources, the repository shall be made responsible to hand on to external services the list of references extracted from their items (papers).

  • Plan of action:

    • Establish test bed of references, out of what repositories find interesting.
    • Create repository API, repository plugin, OAI PMH profile.
    • JISC developer competition to develop toolkits.
    • Then liaise with e.g. Crossref and establish collaboration: the commercial bodies already have such services.
    • Then create a reference item processor as an external service, decomposing references into constituent data.
    • Then build services like Citeseer and Google Scholar—or use those existing services, if they will collaborate.
    • Then build exemplar GUI end user services, e.g. trackbacks, visualisations.
    • Liaising with publishers important but not a dependency for remaining tasks.

UKOLN International Repository Workshop: Introductory remarks

From Norbert Lossau of DRIVER


  • The Vision underlying the workshop is the Berlin 2003 declaration: free & unrestricted access to human knowledge.

  • Need infrastructure to complete the research cycle: discovery > reuse > storage and preservation, for data as well as papers, at an international access level. Establishment of online reputation for researchers is critical.

  • Researchers have their existing discovery procedures; these are to be harmonised, not supplanted.

  • We are already advanced in Global harvesting, preservation of papers, repository storage.

  • A global network of repository infrastructure hubs, rather than one centralised infrastructure.

UKOLN International Repository Workshop

Have just finshed at the UKOLN International Repository Workshop, twittered at #repinf09. The workshop was a joint JISC/DRIVER event; it had international scope, but there were only a couple of East and South Asian participants, and Andrew Treloar and myself from Oceania.

The intention of the workshop was to formulate action plans which would make sense to fund for international infrastructure for repositories—in the first instance, research publication repositories. I took part in the identifier infrastructure workshop, and I have been cited publicly (though anonymously) as saying that it was "surprisingly pragmatic". The information superstructures that can be imposed over identifiers—and what they identify—can get quite open-ended and intellectually satisfying; but our business was to formulate something concrete, fundable, and realisable over the next year or so. What you put on top of it later is for another workshop.

There were four streams to the workshop: four different kinds of infrastructure that could be put in place. The four streams were:


  1. Repository Citation Services: Improving the ways in which citation data relating to open access research papers is shared. Citation data may be forwards or backwards citation. Includes the ability to recognise citations in repositories and the open web.
  2. Repository Handshake: Improving ways in which repositories can be populated with research papers, including authors, other repositories, publishers and research management systems. The "handshake" involves negotiation between a depositing agent and a repository, building on SWORD.
  3. Repository Interoperable Identification Infrastructure: Improve identifying entities in repositories and making connections across repositories, and provide useful services to do so.
  4. Repository Organisation: Provide international organisational support to enable research repositories to work together to meet the objectives of Open Access and eResearch through a confederation of repositories.


I'll post:


  • summaries of what these streams reported back on the three summary get-togethers in the workshop: a couple of streams really changed direction through the workshop.
  • Then, some notes on the first session of the identifier stream (which were behind the first report-back). We did not change tack as drastically as some streams, so they will still help inform what the stream eventually came up with.
  • A summary of the SURF demonstration of their persistent identifier work and their enhanced document work.
  • And finally (if I get to be so bold), my own take on what the identifier stream came up with.