2008-06-28

Version Identification Framework

David Puplett.

Two streams. VERSIONS project then VIF. Much overlap. VERSIONS: e-prints in economics. Main output: toolkit, mainly for authors of e-prints, to understand the types of versions they would be encountering in Open Access landscape (preprint postprint draft etc.)

Beforehand RIVER project and NISO / ALPSP project had come up with terminologies for journal article versions; controversial, because focussed on "versions of record", which privileged postprint as publisher version.

VERSIONS did interviews on what academics' practice was, when they made things public, to whom (e.g. scholarly networks, departments, repositories); then started own version of terminology. Also what behaviours to realistically expect of authors when contributing content: what engagement to realistically expect in differentiating versions on their own, and how aware they were of issues. (Lots of academics deleted as they went, left only printed versions behind.) Reports on surveys on VERSIONS website: lots of anecdotal material from author point of view. Output toolkit to disseminate to repositories: how to describe different versions, and how to make them useful in the repository context. Draft, Submitted, Accepted, Published, Updated.

Since specific to domain and e-prints, could go on with more recommendations; e.g. embedding versioning into cover sheet of e-print (with disclaimers). At mo', added manually. Coversheet embedding ensures googling still gets you the metadata. Inspired by arXiv's use of watermarking into the margins of the PDF.

Open Access has been driver. VIF was about all objects in repositories, so not just items in publication cycle; e.g. also research data, videos. So not involved in Open Access debate. VERSIONS tried to be agnostic towards Open Access, but does support it through encouraging content depositing.

VERSIONS was mostly scoping. Need identified for broader solution to version identification problem; e.g. organising content in repository, versioning of metadata, cross-repository discovery (deduplication).

VIF applicable to any object on repository. 10 months long, with concrete deliverables at the end. (Will be a short followup in September, surveying takeup and further publicity.) Started with its own survey, of academics and repository managers: how they discover content, when they find multiple versions, and how they found version they wanted (or near enough). Bad news: very few people found it easy: terminology is confusing, or no metadata about versions presented at all. Accepted copy, self-deposited (esp. early on): there was minimal metadata gathered. People constantly going back to google, coz no metadata embedded on what they'd retrieved. Led to the framework work.

Education component: raise awareness, help contributors reflect on what versions are: there hasn't been enough repository outreach on educating, since effort has been just on gathering content. Needed to do doom-mongering: versioning is essential to establishing authority for research outputs. Audiences:

  1. repository managers (key audience);
  2. content creators (difficult group to engage with --- if they were already engaged, they knew the bare basics, and only cared about the bare basics, don't care about repository mechanics. So minimised advice burden: no overkill, rely on the toolkit to get people started.) (Toolkit goes into advocacy of repository managers towards content providers.)
  3. Software community: developers of the major repository packages as well as local systems teams customising repositories.

Got progress with e-prints, who were engaged, and have integrated version tagging à la VIF into their development; D-Space much further behind --- have scoped on their own that versioning needs to happen, but have not prioritised it for development. Fedora does versioning, though datastreams: not very flexible: assume linear sequence of versions, which VIF was not restricted to. Recommendations were not to individual software packages but generic. Version support in the three repository packages are different.

VIF counts as versions both FRBR expressions and manifestations: there is disparity in what different people count as a version. The Work (author-title) brings both kinds of object together. Not high awareness at the time of FRBR in the repository community.

e-prints application profile for scholarly metadata. several app profiles coming through, ongoing development: images, geospatial data. e-prints have straightforward FRBR structure; images much more problematic: e.g. what is the subject matter unifying the objects into a single Work. JISC wants the app profile groups to work together more for consistent outputs. Given disparity in resourcing of repositories, challenge of getting coherent policy nationally. To that end, coversheet is much more doable than abstractions of FRBR and app profiles: have had to be pragmatic, not a technical project. Certainly not doing the big philosophical questions of what is a version, but tangible solutions and easy wins. (Unusual for JISC projects, which are more experimental typically.)

Blog updating existing subscribers on developments after project conclusion. Are getting generic questions about metadata which have version issues involved. Written articles to maintain awareness. Have not gotten into learning objects: versioning had not been an issue in UK, because only latest version is maintained. Also large scale repository issues with mapping of astronomical data into different versions way too complex to be in scope.

Some limited vocab work, but not focus of project; reflecting existing best practice.

No comments: