2008-06-24

REPOMMAN

REPOMMAN:

Two goals:

1. facilitate workflow interaction with repository, to facilitate personal interaction. There has been less takeup of repositories because people are only asked for input at the end of the creative process, so the input looks to them like an imposition, extra task. REPOMMAN aims to allow use of repository at start of creative process, capitalise on benefits of repository. e.g. first draft deposited in repository: is secure and backed up. REPOMMAN uses FEDORA, so it has versioning, allows for backdating, revert. Web accessible tool, so users can interact with their own files from anywhere on internet, more flexibly than they would with a network drive. Many many types of digital content, so didn't want to restrict to any one genre: hence FEDORA. Not focusing on open access (Hull is not research intensive) or e-prints, but enabling structured management whatever the content. Much organisational change at Hull about learning materials, which is now settling down, and will decide takeup in that sphere. Pursuing e-thesis content as well.

Pragmaticaly, too much variation to serve all needs, so REPOMMAN could not go down the ICE path of bolting workflow on to content. They treat it as a network drive, competing with Sharepoint, to get user engagement. Interface mimics FTP.

2. Automated generation of metadata. Another perceived barrier to takeup of repositories, esp. for self-archiving. Still no perfect solution, but some things can be done with descriptive metadata. To capture metadata: aspects of profile of user depositing --- deposit happens through portal. Can capture tech metadata (JHOVE). For descriptive metadata, went hunting for tools; best one was IVEA (backend) within Data Fountains project (frontend), ex UC-Riverside. Available download as well as online demo; linux (Debian & Red Hat). Easy install.

Most extracters match texts against standard vocabularies/schemas, to identify key terms; e.g. Nat Library NZ, agricultural collections for metadata extraction (KEA) for preservation metadata. But such solutions need established vocabulary, and work best with single subject repositories, not practical institutional repositories. IVEA does not require standard vocabs. Has been trained to deal with wide range of data; not infallible, but good enough. Deals with anything with words in it. So long as metadata screen is partially populated, easier to complete population.

Joan Grey is setting up Data Fountains account from ARROW. Proposed to do parallel QA tests with Hull. Must follow up; must hook ARCHER up as well.

User requirements gathering done at start: no surprises. Interviews with researchers, admins, teachers. Not released in wild yet: sustainability issue (gradually working towards), and people need to cross the curation boundary from private to public repository, so the additional publish step needs to be scoped as well. ReMAP is that followup project dealing with that, underway this year. Focusses REPOMMAP for records management & preservation. These are library processes that repositories should be supporting, can help get records in the right shape for subsequent processes. REMAP sets flags & notifications for when tasks should happen; e.g. review annually, obsolete, archive (e.g. PRONOM, national archives, AONS). Proactive records management. Notifications are to humans. They use BPEL, and BPEL for People. REPOMMAN identified need for repository to support admin processes. These were low hanging fruit.

Scope for more testing, hoping to do so in parallel with ARROW. Need to generate "good enough" metadata, not perfect; already appears to be there. Working on the institutional takeup; e-theses most promising avenue.

Although REPOMAN is personal space, would like to get into collab space as well. I noted the definition of curation boundary allowing the distinction between collab space and public space to be formulated.

No comments: