2010-04-14

ADLRR2010: Repository Initiatives

Dan Rehak

Registries and repositories.

Dan and others have been drawing pictures of what systems to do content discovery should be.
So what? People don't understand what these diagrams communicate.
Underlying all this are: models. User workflow models. Business models. Service models. Data models. Technical model. The models interact.
Try to constrain the vocabularies in each of the models.
Needs: provide discovery access delivery management; support user expectations, diverse collections, policies, diverse tech, scalings.
Do we want the single Google of learning? Do we want portals? Do we want to search (and sift through), or more to discover relevant content? Social paradigm: pushes content out.
How to get there? People do things the web 2.0 way. (Iditarod illustration of embrace of web 2.0.)

Panel: Initiatives.

Larry Lannom, CNRI.

ADL did interoperability by coming up with SCORM. Registry to encourage reuse of content within DoD: content stays in place, persistent identification, searchable.
ADL works, although policy took a lot of negotiation. The tech has been taken up in other projects: GENI, M-FASR, commercial product currently embargoed.
Problems: limited adoption. Not clear short-term gain, metadata is hard and expensive, reuse presupposes right granularity of what to reuse.
Tech challenges: quality metadata: tools to map to required schemas, create metadata as close to creating content as possible. Federation across heterogeneous data sets, including vocabulary mapping -- intractable as there are always different ways of thinking about world, so need balance between system interop and semantic incompatibility. Lots of tech, but still no coherence.
Future: Need transparent middleware to ingest content. Need default repository service for those who don't have one. Gaming & virtual worlds registry. Internationalisation. Simple metadata for more general use. Need turnkey registry for easier deployment. Need to revisit CORDRA.
Difference between push and pull is implementation detail, should be transparent to user.


Frans van Assche, Ariadne Foundation.

Globe foundation: largest group of repositories in the world.
Ariadne: federation. Services: harvest, auto metadata generation, ranking. Six expert centres counts as success. Lots of providers in federation.
Problems: exchange between GLOBE partners (there are 15). n2 connection matrix. Language problems. Need a central collection registry, rather than have everyone connect to everyone.
Ariadne is a broker between providers; still need to engage end users.
Tech Challenges: scaling up across all of Globe. Ministries had been disclosing very small amounts of resources, now deluging them.
Need to serve users better, with performant discovery mechanisms, dealing with broken links and duplicates and ranking in a federation particularly. Alt knowledge sources such as Slideshare and iTunes Uni: you can't get away from federates search.
Need social metadata, but will have to wait until basic infrastructure in place.
Ultimately want discovery uniquely tailored to user needs.
Multilingual issues pressing in Europe, need mappings between vocabularies: managing 23 languages is difficult.


Sarah Currier, consultancy.

UK Higher ed repositories, CETIS. Policy, and community analysis around repositories.
First time they reached the broad community, not just the early adopters: reflects what they needed, and their sense of community, got non-techie users from Web 1 to Web 2 mindset on how to use and reuse resources. None of the funding went into tech (which is good).
Their success is the end users; but often the repository content could not be exposed via NetVibes or Widgets, which shocked her. Lots of work by small group of people, so Tragedy of Commons; hard to retain engagement with some users -- though tech this time was not the barrier.
"Fly under the radar": IP, metadata profiles, tech -- got quick outcomes because didn't have to bother with that; the cost is, no influence on repository policy to get them to play along.
Still need to start from users (wide range); what we currently have online in Web 2.0 is very user friendly. They are mostly interoperable and backuppable, so sustainability not as much an issue as it used to be. Lack of interop to Web 2.0 from repositories is still major trouble; until DuraSpace gives Web 2.0 feeds, can't build.
This is not creating own Facebook on top of Fedora: this is about using existing tools on top of Fedora.





Thornton Staples, DuraSpace

Durability is in hand with distribution. DuraCloud is their move into CloudSpace, providing trust there.
Fedora, DSpace, Mulgara triplestore.
Fedora is used around the world, now including govt agencies with open data. Now using fedora in interesting ways, not just as archives, but as a graph of interrelated resources, relating also to external resources.
Fedora no longer grant funded, but open source self-standing project.
Problem: communication of what Fedora is and is intended to do, so ppl just expected their own shiny object out of it. Fedora is complicated product. Fedora in between library timescale and IT timescale; should have put out a user-oriented app much earlier than the base infrastructure, this took much longer to happen (only past couple of years), and blocked adoption.
Tech Challenge: scaling. How many objects in repository affects access, discovery, etc. Size of objects also affects this. Data Conservancy is pushing limits of Fedora: are adding new kinds of data streams to deal with such data more effectively.




Jim Martino, Johns Hopkins Data Conservancy

NSF funded. Data curation as means to address challenges in science.
Came about from astronomers wanting to offload data curation onto library. Has broadened in coverage and use.
Driven by science complex needs, disparate data sets. Will do analysis on how data used, including when not to preserve data.
Data is getting more sizeable.

No comments: