2008-10-20
This is the conclusion...
... of the blog summaries of presentations associated with eResearch Australasia 2008.
Presentation on Centre for e-Research, KCL: Tobias Blanke, Mark Hedges
Centre for e-Research, Kings' College. Presentation given at VERSI (Victorian E-Research Strategic Initiative), 2008-10-06
CeRCH were formed out of the Arts & Humanities Data Service; once it was discontinued, KCL set up the centre to keep work going. (The current obligation for data is presentation in the UK is now to hand data over to an institution committed to maintain it for "at least 3 years").
Size of collections had been skyrocketting, because of introduction of video resources (45 TB at the time of axing AHDS). The resources got split up: KCL do performing arts, history went to the Social Sciences data service, archaeology remained independent, language and literature (starting with the Oxford Text Archives) to the Oxford Research Archive.
CeRch has been going for a year, and is designated to support the entire research lifecycle at KCL, including planning and proposals. They will be teaching a Masters on Digital Asset Management from next year, in collaboration with the Centre for Computing in the Humanities. They research e-research.
Grids & Innovation Lab: build on Access Grid as creative space for teaching, strong link to Theatre department. Setting up Campus Grid: KCL still not on board with national grid. Currently piloting with early adopters.
ICTGuides: database of projects, methods and tools in Arts & Humanities. www.Arts-humanities.net: a collaborative environment for e-Humanities. DARIAH. CLARIN. Are starting to move beyond arts to medicine; seeking to work with industry & business as well, as a business service (starting with museums and libraries).
Arts & Humanities e-Science Support Centre
UK infrastructure has been built about national centres.
Push to get more users using services, to recoup costs in e-research; arts & humanities have a lot of users. Plus, network effect because users are more familiar with what is available. Humanities are more about creating sustainable artefacts than science is.
User interface design is more important, because end users are difficult to train up.
Linking Structured Humanities data (LaQuAT). Diverse/non-standard and isolated data resources: allow integration and useful resources. OGSA-DAI as linking infrastructure, allows researchers to retain local ownership. Ref. www.ahessc.ac.uk for projects.
CeRCH were formed out of the Arts & Humanities Data Service; once it was discontinued, KCL set up the centre to keep work going. (The current obligation for data is presentation in the UK is now to hand data over to an institution committed to maintain it for "at least 3 years").
Size of collections had been skyrocketting, because of introduction of video resources (45 TB at the time of axing AHDS). The resources got split up: KCL do performing arts, history went to the Social Sciences data service, archaeology remained independent, language and literature (starting with the Oxford Text Archives) to the Oxford Research Archive.
CeRch has been going for a year, and is designated to support the entire research lifecycle at KCL, including planning and proposals. They will be teaching a Masters on Digital Asset Management from next year, in collaboration with the Centre for Computing in the Humanities. They research e-research.
Grids & Innovation Lab: build on Access Grid as creative space for teaching, strong link to Theatre department. Setting up Campus Grid: KCL still not on board with national grid. Currently piloting with early adopters.
ICTGuides: database of projects, methods and tools in Arts & Humanities. www.Arts-humanities.net: a collaborative environment for e-Humanities. DARIAH. CLARIN. Are starting to move beyond arts to medicine; seeking to work with industry & business as well, as a business service (starting with museums and libraries).
Arts & Humanities e-Science Support Centre
UK infrastructure has been built about national centres.
Push to get more users using services, to recoup costs in e-research; arts & humanities have a lot of users. Plus, network effect because users are more familiar with what is available. Humanities are more about creating sustainable artefacts than science is.
User interface design is more important, because end users are difficult to train up.
Linking Structured Humanities data (LaQuAT). Diverse/non-standard and isolated data resources: allow integration and useful resources. OGSA-DAI as linking infrastructure, allows researchers to retain local ownership. Ref. www.ahessc.ac.uk for projects.
Antonio Calanducci et al.: Digital libraries on the Grid to preserve cultural heritage
Project description
Author digitised: De Roberto.
High resolution scans into multipage works; 2 TB, 8000 pp. Embedded metadata: physical features and some semantics, added in Adobe XMP.
Intended easy navigation, constant availability, long-term preservation (LOCKSS)
Storage element is pool of hard disks; file catalogue is virtual file system across storage systems. Metadata organised by collection.
gLibrary project allows digital library to be stored and accessed through Grid. Filtering browsing, like iTunes.
Author digitised: De Roberto.
High resolution scans into multipage works; 2 TB, 8000 pp. Embedded metadata: physical features and some semantics, added in Adobe XMP.
Intended easy navigation, constant availability, long-term preservation (LOCKSS)
Storage element is pool of hard disks; file catalogue is virtual file system across storage systems. Metadata organised by collection.
gLibrary project allows digital library to be stored and accessed through Grid. Filtering browsing, like iTunes.
Andreas Aschenbrenner & Jens Mittelbach: TextGrid: Towards a national e-infrastructure for the arts and humanities in Germany
Project site
TEI community. Funded to €2 million. Virtual Research Environment: data grid as a virtual archive for data curation; service grid for collab work, including existing TEI tools; and a collaborative platoform for scientific text data processing.
TextGridLab is the grid client. Globus-based infrastructure. The physicists throw away their Grid data because individual pieces of data are of relatively low vaue; TextGrid want to archive data as they go.
Usage by: end users (accessing scholarly texts); editors (philologists); tool developers; institutional content providers.
TEI community. Funded to €2 million. Virtual Research Environment: data grid as a virtual archive for data curation; service grid for collab work, including existing TEI tools; and a collaborative platoform for scientific text data processing.
TextGridLab is the grid client. Globus-based infrastructure. The physicists throw away their Grid data because individual pieces of data are of relatively low vaue; TextGrid want to archive data as they go.
Usage by: end users (accessing scholarly texts); editors (philologists); tool developers; institutional content providers.
Tobias Blanke, Mark Hedges & Stuart Dunn: Grassroots research in Arts & Humanities e-Science in the UK
Use networks to connect resources. e-science agenda was driven by the Grid, to bridge across administrative domains. In e-humanities, books need to talk to each other.
Challenges: ongoing growth of corpora; digital recording of current human developments; computational methods to deal with inconsistent data; reluctance by humanities scholars to collaborate in research.
Engage researchers by giving them money and letting them do something at grass-roots level, though with some coordination.
DARIAH: Digital Research Infrastructure for the Arts & Humanities, European project to facilitate long-term access and use of cultural heritage digital information --- connecting the national data centres. Fedora demonstrator, flat texts across Grid; Arena demonstrator: database integration with web services.
AHRC ICT Methods Network.
Challenges: ongoing growth of corpora; digital recording of current human developments; computational methods to deal with inconsistent data; reluctance by humanities scholars to collaborate in research.
Engage researchers by giving them money and letting them do something at grass-roots level, though with some coordination.
- Virtual workbench. Virtual Vellum project, Sheffield, on top of digitisation of Froissart chronicles: generic collaboration tool over images.
- Arts & music: AMUC motion capture, musicSpace, Purcell Plus. Music has advantage in that music is already digitised.
- Mediaeval Warfare on the Grid (Agent-Based simulation of the Battle of Manzikert);
- ArchaeoTools (data mining from sources, incl. text).
DARIAH: Digital Research Infrastructure for the Arts & Humanities, European project to facilitate long-term access and use of cultural heritage digital information --- connecting the national data centres. Fedora demonstrator, flat texts across Grid; Arena demonstrator: database integration with web services.
AHRC ICT Methods Network.
Peter Higgs: Business data commons: addressing the information needs of creative businesses
Project investigating creative industries, to support better funding. 8% survey responses, so data quality from survey is not good. Recoursing to secondary data, like censuses. No data available about what is happening in business life cycles, or how they change across time; poor granularity. Surveys don't work in general; can't aggregate readily.
Alternative? Build trust, manage identity & confidentiality; crucially, provide benefits to the business for the service, and to the business gateways. Collection must be with open consent; upsell data back to the survey subjects (can provide them more data, and articulate benefits). Ended up escalating information gathering.
Need pseudo-anonymity; can't have anonymity, since are feeding the data back to the survey subjects (delta data gathering). Use OpenID to do pseudo-anonymity. Do not reask questions already gathered. Benefit to business is personalised benchmark report: customised to each recipient.
This appears to be a good strategy for collecting embarrassing data like hospital performance; emphasise that this is benchmarking, so can improve your own performance over time, given the data you have made available to the survey.
Alternative? Build trust, manage identity & confidentiality; crucially, provide benefits to the business for the service, and to the business gateways. Collection must be with open consent; upsell data back to the survey subjects (can provide them more data, and articulate benefits). Ended up escalating information gathering.
Need pseudo-anonymity; can't have anonymity, since are feeding the data back to the survey subjects (delta data gathering). Use OpenID to do pseudo-anonymity. Do not reask questions already gathered. Benefit to business is personalised benchmark report: customised to each recipient.
This appears to be a good strategy for collecting embarrassing data like hospital performance; emphasise that this is benchmarking, so can improve your own performance over time, given the data you have made available to the survey.
Kylie Pappalardo: Publishing and Open Access in an e-Research Environment
OAKLAW: Open Access to Knowledge Law Project
Survey of academic authors on publishing agreements and open access: 509 participants. Generally they support open access. They are OK with end user free access, and reuse. Many authors don't deposit into repositories because they don't know there is a repository -- or what the legal status of their publication was. Most did not know whether they had licensed or assigned copyright (i.e. whether they retained rights). General lack of understanding of copyright issues; authors are not asking the publishers the right questions.
Arts & Humanities scholars are more clued in about copyright issues than Science scholars.
OAKLIST, tracking publisher positions on open access and repositories. OAKLAW have also produced guide to open access for researchers; sample publishing agreement (forthcoming): exclusive license to publish, non-exclusive license for other rights.
Survey of academic authors on publishing agreements and open access: 509 participants. Generally they support open access. They are OK with end user free access, and reuse. Many authors don't deposit into repositories because they don't know there is a repository -- or what the legal status of their publication was. Most did not know whether they had licensed or assigned copyright (i.e. whether they retained rights). General lack of understanding of copyright issues; authors are not asking the publishers the right questions.
Arts & Humanities scholars are more clued in about copyright issues than Science scholars.
OAKLIST, tracking publisher positions on open access and repositories. OAKLAW have also produced guide to open access for researchers; sample publishing agreement (forthcoming): exclusive license to publish, non-exclusive license for other rights.
Peter Sefton: Priming digital humanities support services
Visual ethnography project: recording messages from schools to community. They use nVivo. Analyses locked up in proprietary tool.
Recommends using image tags instead of nVivo to embed qualitative evaluation. Data aware publications, with the ICE authoring tool and microformats.
Given how scholarly publishing is run, data aware publications are currently unused -- publishers just take Word documents and put them to paper. Need at the least a standards group for semantic documents; and ARCS will have some tools to bear.
Recommends using image tags instead of nVivo to embed qualitative evaluation. Data aware publications, with the ICE authoring tool and microformats.
Given how scholarly publishing is run, data aware publications are currently unused -- publishers just take Word documents and put them to paper. Need at the least a standards group for semantic documents; and ARCS will have some tools to bear.
Elzbieta Majocha: "So many tools"
Humanities scholars are loners writing monographs; funding agencies, however, like collaborative interdisciplinary projects. Resource production in the humanities is labour intensive, and reuse is desired (though not practiced).
Example attempt: collaborative wiki for the Early Mediaeval History Network. This was imposed onto an existing research community. Takeup was only after training, demo, and leading by example. And then --- it died again. So was it a success?
Shared resource library, semantic web underpinnings, on PLONE. "We built it, why aren't they coming?" (But there's no images in the library, no metadata, and no updates!)
Setting up collaboration: "What, not on email? Not another portal?" No activity, because the collaborationware was bolted on to the group -- who had no time to spend on the project anyway.
So how to make the culture change? Mandate works when there is support, but evaporates when there is not. Build And They Will Come? No: Hope is Not a Plan. Those who have to use it will use it? But it is not clear that they will.
To get takeup of collaborative infrastructure, make it part of the researcher's daily workflow -- e.g. daily workdiary.
Example attempt: collaborative wiki for the Early Mediaeval History Network. This was imposed onto an existing research community. Takeup was only after training, demo, and leading by example. And then --- it died again. So was it a success?
Shared resource library, semantic web underpinnings, on PLONE. "We built it, why aren't they coming?" (But there's no images in the library, no metadata, and no updates!)
Setting up collaboration: "What, not on email? Not another portal?" No activity, because the collaborationware was bolted on to the group -- who had no time to spend on the project anyway.
So how to make the culture change? Mandate works when there is support, but evaporates when there is not. Build And They Will Come? No: Hope is Not a Plan. Those who have to use it will use it? But it is not clear that they will.
To get takeup of collaborative infrastructure, make it part of the researcher's daily workflow -- e.g. daily workdiary.
Katie Cavanagh: How Humanities e-Researchers can come to love infrastructure
Flinders Humanities Research Centre for Cultural Heritage and Cultural Exchange
Do e-research structures form the questions, or do the questions define the structures? There should be a feedback loop connecting the two; but there is a worry that the tools are not actually helping ask the right scholarly questions.
Archiving influences the construct of the archive itself.
Institutional repositories are driven by capture and preservation, not retrieval and interpretation. Institutional repository content is not googleable: the archive is orphaned from its context, so it is no longer retrievable into a sensible context. If institutional repositories are for research, where is the middleware to provide access to the repositories? Must all projects be bespoke, and can unique solutions interact? What can you currently do with institutional repositories, other than print out PDFs?
Humanities queries memory and cultural heritage, not just data sets; so it depends on context. Important to curate collections, not just archive them. And doing so is no quicker than with paper collections; nor is it immediately obvious to researchers that it's more useful to do so digitally.
Metadata and preservation are not the problems to be solved any more; making the content usable and discoverable is.
Multi-pronged approach: build a community around modest tools; create tools to underscore current research practice (e.g. OCR); user centered design.
Also, track researchers who are already doing good practice and have the IT skills. Create forums, ICT guides, etc.
Make the infrastructure indispensable.
Do e-research structures form the questions, or do the questions define the structures? There should be a feedback loop connecting the two; but there is a worry that the tools are not actually helping ask the right scholarly questions.
Archiving influences the construct of the archive itself.
Institutional repositories are driven by capture and preservation, not retrieval and interpretation. Institutional repository content is not googleable: the archive is orphaned from its context, so it is no longer retrievable into a sensible context. If institutional repositories are for research, where is the middleware to provide access to the repositories? Must all projects be bespoke, and can unique solutions interact? What can you currently do with institutional repositories, other than print out PDFs?
Humanities queries memory and cultural heritage, not just data sets; so it depends on context. Important to curate collections, not just archive them. And doing so is no quicker than with paper collections; nor is it immediately obvious to researchers that it's more useful to do so digitally.
Metadata and preservation are not the problems to be solved any more; making the content usable and discoverable is.
Multi-pronged approach: build a community around modest tools; create tools to underscore current research practice (e.g. OCR); user centered design.
Also, track researchers who are already doing good practice and have the IT skills. Create forums, ICT guides, etc.
Make the infrastructure indispensable.
Jo Evans: Designing for Diversity
e-Scholarship Research Centre, University of Melbourne.
"The system won't let you do that?" No, the designer didn't. Technologists need to know about real requirements; scholars need to be able to articulate requirements.
The humanities involve heterogeneous data, used with a long tail, and with limited availability of resources for humanities. Systems for e-research in the humanities must be designed with those constraints in mind, allowing for tailoring: What and when to standardise, what to customise, and what to customise in standardised ways (once the technologies allow it, e.g. CSS/XML).
Design philosophy: standardise the back-end, customise the front-end. The back-end must be of archival quality, and scholarly.
They use OHRM (Online Heritage Resource Manager) as a basis for their systems: OHRM describes entities and contexts separately, and allows custom ontology reflecting community standards. Extensible types. Front end has exhibition functionality. Templates to add pages to presentation.
The tailored exhibition is a new research narrative. Can have service oriented approach to link to other information. The centre encourages OHRM as a tool for active research, not just for research outputs. Build incrementally, not in response to imagined needs.
"The system won't let you do that?" No, the designer didn't. Technologists need to know about real requirements; scholars need to be able to articulate requirements.
The humanities involve heterogeneous data, used with a long tail, and with limited availability of resources for humanities. Systems for e-research in the humanities must be designed with those constraints in mind, allowing for tailoring: What and when to standardise, what to customise, and what to customise in standardised ways (once the technologies allow it, e.g. CSS/XML).
Design philosophy: standardise the back-end, customise the front-end. The back-end must be of archival quality, and scholarly.
They use OHRM (Online Heritage Resource Manager) as a basis for their systems: OHRM describes entities and contexts separately, and allows custom ontology reflecting community standards. Extensible types. Front end has exhibition functionality. Templates to add pages to presentation.
The tailored exhibition is a new research narrative. Can have service oriented approach to link to other information. The centre encourages OHRM as a tool for active research, not just for research outputs. Build incrementally, not in response to imagined needs.
Paul Turnbull & Mark Fallu. Making cross-cultural history in networked digital media
Detailed paper
Too often, solutions in e-humanities complicate rather than supporting work. The point of e-research is to enable work.
Building on existing web project on the history of the South Seas with NLA. Political, cultural, and technical problems have been encountered. They have made a point to use techniques and technical standards for web-based scholarship. Historians have disputed that they used the web instead of computers to communicate -- but they do anyway.
Will now use TEI P5 markup, which they couldn't use in 2000. Want web-based collaborative editing. Images with Persistent identifiers; can scrape metadata off Picture Australia, and otherwise capitalise on other existing online resources.
Collaboration with AUSTeHC allowed solid grounding of knowledge management.
By 2000 the future of digital history was distributed/collaborative editorship. Visual appraisal of historical knowldge is important (Yet Another Google Maps Mashup; also timelines) (CIDOC CRM ontology entries) --- this is now feasible, not really back then.
Complex and contested knowledge: a nodal architecture, using ontologies and based on PLONE (CIDOC-CRM, with Finnish History Ontology (HISTO) and ABC Harmony). Lots of tools now available, should be able to integrate rather than redesign.
Information will not stay static or be represented in a single way; need to create connections between information on the fly. TEI P4 markup is sound foundation, stores the original textual record; TEI P5 introduces model of relation between content and real world (semantics). They have built on that with microformats --- not embedded in the original XML, but in annotation.
Migrating content online: build conditions of trust. Without a solid architecture, cannot trust presentation of knowledge enough to build scholarly debate on it.
Must be careful of language use in persuading academics to adopt technology. e.g. you are representing Knowledge, not mere Content.
e-history is a partnership between historian and IT: it is not just the historian's intellectual achievement. Yet the problem in the past has been foregrounding IT, rather than what IT can do for scholarship.
Too often, solutions in e-humanities complicate rather than supporting work. The point of e-research is to enable work.
Building on existing web project on the history of the South Seas with NLA. Political, cultural, and technical problems have been encountered. They have made a point to use techniques and technical standards for web-based scholarship. Historians have disputed that they used the web instead of computers to communicate -- but they do anyway.
Will now use TEI P5 markup, which they couldn't use in 2000. Want web-based collaborative editing. Images with Persistent identifiers; can scrape metadata off Picture Australia, and otherwise capitalise on other existing online resources.
Collaboration with AUSTeHC allowed solid grounding of knowledge management.
By 2000 the future of digital history was distributed/collaborative editorship. Visual appraisal of historical knowldge is important (Yet Another Google Maps Mashup; also timelines) (CIDOC CRM ontology entries) --- this is now feasible, not really back then.
Complex and contested knowledge: a nodal architecture, using ontologies and based on PLONE (CIDOC-CRM, with Finnish History Ontology (HISTO) and ABC Harmony). Lots of tools now available, should be able to integrate rather than redesign.
Information will not stay static or be represented in a single way; need to create connections between information on the fly. TEI P4 markup is sound foundation, stores the original textual record; TEI P5 introduces model of relation between content and real world (semantics). They have built on that with microformats --- not embedded in the original XML, but in annotation.
Migrating content online: build conditions of trust. Without a solid architecture, cannot trust presentation of knowledge enough to build scholarly debate on it.
Must be careful of language use in persuading academics to adopt technology. e.g. you are representing Knowledge, not mere Content.
e-history is a partnership between historian and IT: it is not just the historian's intellectual achievement. Yet the problem in the past has been foregrounding IT, rather than what IT can do for scholarship.
Steve Hays & Ian Johnson. Building integrated databases for the web
Archaeological Computing Laboratory, University of Sydney
Novel data modelling approach. Real world relationships aren't as simple as what is modelled by entity relationship diagrams; there can be multiple contingent relations changing over time, and entities can split into complex types as knowledge grows.
Heurist knowledge management model: start with table of record types, then table of detail types (= fields), and requirements table, binding details to records and how they behave. Summary data is stored in a record, detail information are stored as name/value pairs. Relationships are modelled as a first order record. (Reifying the relationship allows it to have attributes.)
Raw querying performance is poor; can't use complex SQL queries; obscure to explain. But performs acceptably with 100k records; export to RDF triple store with SPARQL to improve performance. Increase in flexibility will outweigh drawbacks.
Point is to create a meta-database, linking info to info across archives. (Would like to use persistent identifiers to do so.)
Novel data modelling approach. Real world relationships aren't as simple as what is modelled by entity relationship diagrams; there can be multiple contingent relations changing over time, and entities can split into complex types as knowledge grows.
Heurist knowledge management model: start with table of record types, then table of detail types (= fields), and requirements table, binding details to records and how they behave. Summary data is stored in a record, detail information are stored as name/value pairs. Relationships are modelled as a first order record. (Reifying the relationship allows it to have attributes.)
Raw querying performance is poor; can't use complex SQL queries; obscure to explain. But performs acceptably with 100k records; export to RDF triple store with SPARQL to improve performance. Increase in flexibility will outweigh drawbacks.
Point is to create a meta-database, linking info to info across archives. (Would like to use persistent identifiers to do so.)
Presentations from the workshop on e-Research in the Arts, Humanities and Cultural Heritage
Abstract
The following posts are summaries of presentations in the workshop on e-Research in the Arts, Humanities and Cultural Heritage, held after the eResearch Australasia 2008 conference.
The following posts are summaries of presentations in the workshop on e-Research in the Arts, Humanities and Cultural Heritage, held after the eResearch Australasia 2008 conference.
eResearch Australasia Workshop: ANDS: Developing eResearch Capabilities
Abstract
Core ANDS team has 3 EFT dedicated to developing capabilities: community engagement, activity coordination, needs analysis, materials preparation & editing, event logistics, knowledge transfer/ train the trainer, surveys, reviews of progress.
$150K for materials & curriculum development; $50K in training and events logistics. Course delivery will be undertaken by the institutions, not ANDS.
Assumptions: ANDS is not funded to train everybody. It will partner with organisations to develop content to train from. A structured, national coordinated set of training materials will make a difference. ANDS will do some train-the-trainer activities; it will partner with strategic communities to deliver capability building. Training will be complemented by adhoc workshops.
Outcomes: a structured set of modules; partner to develop & maintain them; partners to deliver them; a certification framework.
Cultural collections sector in scope for ANDS. However, the cultural collections sector is not prioritised for training, because they are not ready; they need different approach to bring them up to speed. (The Atlas of Living Australia project involves cultural collections, so that will accelerate engagement with the cultural sector.)
Core ANDS team has 3 EFT dedicated to developing capabilities: community engagement, activity coordination, needs analysis, materials preparation & editing, event logistics, knowledge transfer/ train the trainer, surveys, reviews of progress.
$150K for materials & curriculum development; $50K in training and events logistics. Course delivery will be undertaken by the institutions, not ANDS.
Assumptions: ANDS is not funded to train everybody. It will partner with organisations to develop content to train from. A structured, national coordinated set of training materials will make a difference. ANDS will do some train-the-trainer activities; it will partner with strategic communities to deliver capability building. Training will be complemented by adhoc workshops.
Outcomes: a structured set of modules; partner to develop & maintain them; partners to deliver them; a certification framework.
Cultural collections sector in scope for ANDS. However, the cultural collections sector is not prioritised for training, because they are not ready; they need different approach to bring them up to speed. (The Atlas of Living Australia project involves cultural collections, so that will accelerate engagement with the cultural sector.)
Tracey Hind, CSIRO: Enterprise perspective of Data Management
CSIRO has 5 areas and 16 research divisions. 9 Flagship programs to produce research across the streams. 100 themes and many hundred projects and partnerships. There are localised solutions to data management, but CSIRO needs an enterprise solution. Not all divisions actually use the enterprise data storage solution already in place.
CSIRO needs to discover and partner data with other researchers; maximise value of their investment in infrastructure; open access to data. Will enable the flagship projects to move forwards.
CSIRO does not recognise data management is an issue still: e-Science Information Management strategy (eSIM is still unfunded. Scientists are not working well across disciplines --- they don't know what they don't know. Data management is not a technology issue, but a human problem. Easy discovery of data is key, but divisions do not understand potential of unlocking their own data. Data Management is a hard sell: there are no showpieces like machine rooms; researchers don't understand the benefits immediately. Need exemplar projects to demonstrate benefits of data management, to get buy-in.
eSIM model to build capabilities: people, processes, technology, governance. people challenges, e.g. incentives for people to deposit data into repositories. (May even need changes in job descriptions.) Governance includes proper enterprise funding, data management plan requirements.
Exemplar projects: AuScope, Atlas of Living Australia, Corporate Communications (managed data repository, enterprise workflow and process, corporate reporting). Exemplars will drive changing behaviours.
Researchers will be easier to convince of the benefits of integrated data management than the lawyers will be.
CSIRO needs to discover and partner data with other researchers; maximise value of their investment in infrastructure; open access to data. Will enable the flagship projects to move forwards.
CSIRO does not recognise data management is an issue still: e-Science Information Management strategy (eSIM is still unfunded. Scientists are not working well across disciplines --- they don't know what they don't know. Data management is not a technology issue, but a human problem. Easy discovery of data is key, but divisions do not understand potential of unlocking their own data. Data Management is a hard sell: there are no showpieces like machine rooms; researchers don't understand the benefits immediately. Need exemplar projects to demonstrate benefits of data management, to get buy-in.
eSIM model to build capabilities: people, processes, technology, governance. people challenges, e.g. incentives for people to deposit data into repositories. (May even need changes in job descriptions.) Governance includes proper enterprise funding, data management plan requirements.
Exemplar projects: AuScope, Atlas of Living Australia, Corporate Communications (managed data repository, enterprise workflow and process, corporate reporting). Exemplars will drive changing behaviours.
Researchers will be easier to convince of the benefits of integrated data management than the lawyers will be.
eResearch Australasia Workshop: ANDS: Seeding the Australian Data Commons
Abstract
ANDS: Australian National Data Service
Goal: greater access to research data assets, in forms that support easier and more effective data use and reuse. ANDS will be a "voice for data".
Not all data shared will be open; institutionally supported storage solutions (not enough funding to do its own storage); ANDS will only start to build the Data Commons.
Largely the Commons will be virtual access: no centralised point of storage.
ANDS delivery: developing frameworks, providing utilities, seeding the commons, building capabilities. Mediated through NeAT (National e-Research Architecture Taskforce).
One way of seeding Data Commons is through discovery service. Make more things available for harvesting, and link them with persistent identification. Discovery service will be underpinned by ISO 2146 (Registry Services for Libraries and Related Organisations). Will need to collect ISO 2146 data to describe entities for discovery service.
Seeding the commons will involve opportunistic content recruitment in the first year, and targetted areas in years 2 and 3, to improve data management, content, and capture. Are working with repository managers to identify candidates for content recruitment.
Content systems enhancement:
Building Capabilities: train-the-trainer model; initial targets: early career researchers, research support staff. Build a community around data management.
Establishment project has met its deliverable; DIISR has signed contract. First business plan available online, runs to June 2009.
ANDS and ARCS are closely related. ARCS are tying state-based storage fabric into national fabric. ANDS are agnostic as to what storage you use.
ANDS: Australian National Data Service
Goal: greater access to research data assets, in forms that support easier and more effective data use and reuse. ANDS will be a "voice for data".
Not all data shared will be open; institutionally supported storage solutions (not enough funding to do its own storage); ANDS will only start to build the Data Commons.
Largely the Commons will be virtual access: no centralised point of storage.
ANDS delivery: developing frameworks, providing utilities, seeding the commons, building capabilities. Mediated through NeAT (National e-Research Architecture Taskforce).
One way of seeding Data Commons is through discovery service. Make more things available for harvesting, and link them with persistent identification. Discovery service will be underpinned by ISO 2146 (Registry Services for Libraries and Related Organisations). Will need to collect ISO 2146 data to describe entities for discovery service.
Seeding the commons will involve opportunistic content recruitment in the first year, and targetted areas in years 2 and 3, to improve data management, content, and capture. Are working with repository managers to identify candidates for content recruitment.
Content systems enhancement:
- convene a tech forum (ANU responsibility), map the landscape;
- model a reference data repository software stack, available for easy deployment;
- repository interface toolkit for easier submission --- working with SWORD deposit protocol;
- relationships with equivalent operations overseas.
Building Capabilities: train-the-trainer model; initial targets: early career researchers, research support staff. Build a community around data management.
Establishment project has met its deliverable; DIISR has signed contract. First business plan available online, runs to June 2009.
ANDS and ARCS are closely related. ARCS are tying state-based storage fabric into national fabric. ANDS are agnostic as to what storage you use.
2008-10-08
James Dalziel, Macquarie.U: Deployment Strategies for Joining the AAF Shibboleth Federation
Abstract
Trust Federations have emerged as alternatives to services running their own accounts. Identity provider, Service provider, trust federation connecting them -- with policy and technical framework.
Trust federation requires identity providers to establish who there is and how we know about them; the identity provider joins the trust federation; a service provider joins the trust federation, and gives user attributes to determine access.
The MAMS testbed federation has 27 ID providers (900k identities), 28 service providers -- from repositories to wikis to forums. The core infrastructure, including WAYF (Where Are You From service), is already production quality. MAMS working on software to help deployment.
"AAF" (Australian Access Federation) is the legal framework; "Shibboleth Federation" is the technical framework: the fabric to realise the legal federation.
A Shibboleth federation does not require Shibboleth software: it only commits to Shibboleth data standards. Shibboleth software is the reference implementation, but there are others.
Principles for federation have been formulated and are available
Deployment models. Do allow partial deployment models. Requirement AAF Req14 has core and optional attributes, and identity providers can limit deployment at first to staff who will be known to use the federation, rather than blanket all staff. Could have separate directory for just that staff, with just their core attributes. Alternative: staff only deployment, or full staff identity records and partial student records.
Also facility for the federation to map between native and AAF attributes.
Shibboleth version 1 is most safe to use; Shibboleth version 2 can be used with due caution to interoperability. OpenID supports weaker trust than a proper education federation; it can be added to a Shibboleth Identity Provider as a plugin.
Service provider: need to connect the Shibboleth Service Provider software (or equivalent) to your software; then determine the required attributes for access. Could specify authentication protocol as attribute; service description contains one or more service offerings. Can use "People Picker" to nominate individuals rather than entire federation. Could specify other policies on top, e.g. fees; that's a non-technical arrangement.
Trust Federations have emerged as alternatives to services running their own accounts. Identity provider, Service provider, trust federation connecting them -- with policy and technical framework.
Trust federation requires identity providers to establish who there is and how we know about them; the identity provider joins the trust federation; a service provider joins the trust federation, and gives user attributes to determine access.
The MAMS testbed federation has 27 ID providers (900k identities), 28 service providers -- from repositories to wikis to forums. The core infrastructure, including WAYF (Where Are You From service), is already production quality. MAMS working on software to help deployment.
"AAF" (Australian Access Federation) is the legal framework; "Shibboleth Federation" is the technical framework: the fabric to realise the legal federation.
A Shibboleth federation does not require Shibboleth software: it only commits to Shibboleth data standards. Shibboleth software is the reference implementation, but there are others.
Principles for federation have been formulated and are available
Deployment models. Do allow partial deployment models. Requirement AAF Req14 has core and optional attributes, and identity providers can limit deployment at first to staff who will be known to use the federation, rather than blanket all staff. Could have separate directory for just that staff, with just their core attributes. Alternative: staff only deployment, or full staff identity records and partial student records.
Also facility for the federation to map between native and AAF attributes.
Shibboleth version 1 is most safe to use; Shibboleth version 2 can be used with due caution to interoperability. OpenID supports weaker trust than a proper education federation; it can be added to a Shibboleth Identity Provider as a plugin.
Service provider: need to connect the Shibboleth Service Provider software (or equivalent) to your software; then determine the required attributes for access. Could specify authentication protocol as attribute; service description contains one or more service offerings. Can use "People Picker" to nominate individuals rather than entire federation. Could specify other policies on top, e.g. fees; that's a non-technical arrangement.
Ron Chernich, U.Queensland: A Generic Schema-Driven Metadata Editor for the eResearch Community
Abstract
Schema-based metadata editor (MDE): to ensure highest quality data through conformance. Lightweight client, Web 2.0. Builds on ten years of previous editor experience. It has emerged that users wanted editor with conformance, usable in their browser (no installation).
Generic metadata editor: cross-browser, generic to schemata. Schema driven, where the schema includes validation constraints. Help as floating messages. Cannot enforce a persistence mechanism, because that is app-specific. Nesting of elements. Live validation.
Web server supports the embedding application, the application calls MDE. MDE talks to Service Provider Interface via a broker, to fetch the metadata record given the record identifier (from the application); and MDE talks to the metadata schema repository to get schema.
Problems: Cross-browser portability; the EXT library Javascript library has given them good portability. Security delegated to the Service Provider.
Schemas are typically in XSD, which are is normalised and contain embedded schemata. Normalisation and flattening require preprocessing, so they use the MSS format as a type of flattened XSD. Reuse encouraged with documentation and reference implemention.
Why not XForms? Not very user friendly.
Available at metadata.net. To do: add element refinements; implement encoding scheme support; provide ontology and thesaurus tie-in.
Schema-based metadata editor (MDE): to ensure highest quality data through conformance. Lightweight client, Web 2.0. Builds on ten years of previous editor experience. It has emerged that users wanted editor with conformance, usable in their browser (no installation).
Generic metadata editor: cross-browser, generic to schemata. Schema driven, where the schema includes validation constraints. Help as floating messages. Cannot enforce a persistence mechanism, because that is app-specific. Nesting of elements. Live validation.
Web server supports the embedding application, the application calls MDE. MDE talks to Service Provider Interface via a broker, to fetch the metadata record given the record identifier (from the application); and MDE talks to the metadata schema repository to get schema.
Problems: Cross-browser portability; the EXT library Javascript library has given them good portability. Security delegated to the Service Provider.
Schemas are typically in XSD, which are is normalised and contain embedded schemata. Normalisation and flattening require preprocessing, so they use the MSS format as a type of flattened XSD. Reuse encouraged with documentation and reference implemention.
Why not XForms? Not very user friendly.
Available at metadata.net. To do: add element refinements; implement encoding scheme support; provide ontology and thesaurus tie-in.
Chris Myers, VERSI: Virtual Beamline eResearch Environment at the Australian Synchrotron
Abstract
Optimise use of expensive synchrotrons: remote usage.
User friendly, safe, reliable, fast, modularly designed.
Optimise use of expensive synchrotrons: remote usage.
User friendly, safe, reliable, fast, modularly designed.
- Web interface to synchrotron for monitoring.
- WML interface to phones.
- Educational Virtual BeamLine.
- Online Induction System (slides + video).
- Beamline Operating Scheduling System (scheduling).
- Instant Messaging. Transfer portal into e.g. SRB.
Alan Holmes, Latrobe: Virtuosity: techniques, procedures & skills for effective virtual communications
Abstract
The technically non-savvy have lots of misunderstandings about how to use the Internet. Much is taken for granted in face-to-face -- visual cues. Absent those cues, different strategies are needed to make communication effective.
Survey, 26 interviews, of people spending more than 60 hrs/wk online.
These steps apply to any new tool.
Typology of high end users:
The technically non-savvy have lots of misunderstandings about how to use the Internet. Much is taken for granted in face-to-face -- visual cues. Absent those cues, different strategies are needed to make communication effective.
Survey, 26 interviews, of people spending more than 60 hrs/wk online.
- Initiation,
- Experimentation (to establish, in a feedback loop,
- Efficiency,
- Identity, and
- Networking),
- Integration.
These steps apply to any new tool.
- Initiation:
- intrinsic factors (technophobic?), extrinsic factors (drivers, availability of assistance)
- Experimentation
- Efficiency;
- Identity:
- how present myself to the world;
- Networking:
- socialisation, developing group norms.
- Efficiency;
- Integration:
- enhance life (improve what you do, as driver);
- limiting own usage (avoiding isolation, stress and burnout);
- technology usage (media manipulation, convergence)
Typology of high end users:
- New Frontiersmen (early adopters, male, utopians; have self-limited their usage, disillusioned with how others have used internet; are no longer big socialisers);
- Pragmatic Enterpreneurs (small business, mainly women, net is mechanism for business, have multiple businesses, little exploring and socialising, have relative or friend to provide tech assistance, are very protective of computer and physical environment)
- Technicians (like gadgets, mainly male, net is huge library, work in IT, disparage most people's use of net; distrust net-based non-verifiable info; like strategy games but not to socialise; into anime & scifi tv; aware of addiction and actively self-limit);
- Virtual Workers (the virtual environment is just a workspace; even gender split; separate virtual persona from real life; use net for info and trust it; good at manipulating virtual environment; focus on speed & efficiency)
- Entertainers (the Net is a carnival; even gender split; lots of socialising games incl. poker; dating, download, contact, socialising; love the newness of the Net, early adopters of SecondLife and social networking sites, and move on quickly to next thing; unaware of addiction, and tend to social isolation in real life);
- Social Networkers (the Net is all about me; mainly women under 25; the Net allows them to stay in contact with friends; not open to communicating outside narrow circle; not fussed about privacy; outgoing, share info, gossip, photos; use the Net as proof and way to brag; love the tech convergence which helps them document their lives; don't like downloading stuff -- takes too long)
2008-10-06
Ashley Wright, ARCS: ARCS Collaboration Services
Abstract
ARCS: Australian Research Collaboration Services
ARCS provides longterm eresearch support, esp. collaboration. Modalities: Video; Web Based; Custom.
Video service is Desktop-based; allows short-burst communications, instead of physical attendance or extensive timespans (e.g. Access Grid). Needs to scale to large numbers, allow encryption. Obviate need for special room booking.
EVO: Enabling Virtual Organisations; Access Grid. ARCS provide advice on deployment.
EVO licensed by Caltech. Can create user communities on request. 222 registered users under ARCS-AARNET community; other australian communities: AuScope, TRIN. More than 100 communities worldwide, mostly high-energy physics. Soon: phone dial-in to meetings, interaction with AARNET audio/video conferencing; Australian portal for registration & support.
Access Grid: advice on equipment & installation; quality assurance.
Web Collaboration: CMS, collabative environments, shared apps (like Google Apps). Wiki, forums, file sharing, annotation. Full control over user permissions & visibility. Hosting by ARCS or can help set up locally. Sakai, Drupal, Plone, Wikis.
Customisation. They have staff on hand to help: advice & options
Are open to adopting New Tools.
ARCS: Australian Research Collaboration Services
ARCS provides longterm eresearch support, esp. collaboration. Modalities: Video; Web Based; Custom.
Video service is Desktop-based; allows short-burst communications, instead of physical attendance or extensive timespans (e.g. Access Grid). Needs to scale to large numbers, allow encryption. Obviate need for special room booking.
EVO: Enabling Virtual Organisations; Access Grid. ARCS provide advice on deployment.
EVO licensed by Caltech. Can create user communities on request. 222 registered users under ARCS-AARNET community; other australian communities: AuScope, TRIN. More than 100 communities worldwide, mostly high-energy physics. Soon: phone dial-in to meetings, interaction with AARNET audio/video conferencing; Australian portal for registration & support.
Access Grid: advice on equipment & installation; quality assurance.
Web Collaboration: CMS, collabative environments, shared apps (like Google Apps). Wiki, forums, file sharing, annotation. Full control over user permissions & visibility. Hosting by ARCS or can help set up locally. Sakai, Drupal, Plone, Wikis.
Customisation. They have staff on hand to help: advice & options
Are open to adopting New Tools.
Nicki Henningham & Joanne Evans, U. Melbourne: Australian Women's Archives
Australian Women's Archives: Next generation infrastructure for women's studies
Abstract
Fragmented record keeping in the past, because organisations were not institutional; best addressed by personal papers, which are much more susceptible to loss. Project initiated from awareness of the impermanence of the data. Encourages womens' organisations to protect their records and deposit them. Maintains register of data.
Based on OHRM. All-in-one biographical dictionary, with annotated bibliography.
A working model of federated info architecture for sustainable humanities computing. Enhanced capabilities, improved sustainability.
Both content development, acting as aggregator and annotator; and technological development, to support creation, capture, and reuse of data. Feeds into People Australia through harvest; will be populating into registry by harvesting from researchers (and vice versa). Need lightweight solutions because of diversity of platforms.
Abstract
Fragmented record keeping in the past, because organisations were not institutional; best addressed by personal papers, which are much more susceptible to loss. Project initiated from awareness of the impermanence of the data. Encourages womens' organisations to protect their records and deposit them. Maintains register of data.
Based on OHRM. All-in-one biographical dictionary, with annotated bibliography.
A working model of federated info architecture for sustainable humanities computing. Enhanced capabilities, improved sustainability.
Both content development, acting as aggregator and annotator; and technological development, to support creation, capture, and reuse of data. Feeds into People Australia through harvest; will be populating into registry by harvesting from researchers (and vice versa). Need lightweight solutions because of diversity of platforms.
Margaret Birtley, Collections Council of Australia: Integrating systems to deliver digital heritage collections
Abstract
2500 collecting organisations in Australia; uneven staffing and resourcing. Still much to go with digitising collections.
Digital heritage is a subset of made or born digital information, prioritised for significance. Digital heritage is organised and structured, managed for access and use.
Collecting organisations innovate with Web 3.0, data selection & curation (metadata protocols).
Researcher issues: collection visibility, accessibility, availability, interoperability.
Collecting organisations issues: funding, lack of coordination, standards
Working towards national framework; action plan 2007, being broken up into advocacy plans and development plans.
2500 collecting organisations in Australia; uneven staffing and resourcing. Still much to go with digitising collections.
Digital heritage is a subset of made or born digital information, prioritised for significance. Digital heritage is organised and structured, managed for access and use.
Collecting organisations innovate with Web 3.0, data selection & curation (metadata protocols).
Researcher issues: collection visibility, accessibility, availability, interoperability.
Collecting organisations issues: funding, lack of coordination, standards
Working towards national framework; action plan 2007, being broken up into advocacy plans and development plans.
Mark Hedges, King's College London: ICTGuides
ICTGuides: advancing computational methods in the digital humanities
Abstract
ICTGuides: Taxonomy of methods in humanities to allow reuse; facilitate communities of practice around common computational methods.
Listing of projects includes metadata and service standards. Includes tutorials, available tools.
Abstract
ICTGuides: Taxonomy of methods in humanities to allow reuse; facilitate communities of practice around common computational methods.
Listing of projects includes metadata and service standards. Includes tutorials, available tools.
Mark Birkin, U. Leeds: An architecture for urban simulation enabled by e-research
Abstract
Model impact of demographic change for service provision in cities.
Construct a synthetic population, with fully enumerated households; Demographic projections: dynamically model individual state transitions; look at particular application domains.
Components, MoSeS (Modelling and Simulation for e-Social Science): Data : Analysis : Computation : Visualisation : Collaboration. Aim to use secondary as well as primary data. User functionality through JSR-168 portlets. Moving to loose coupling with web services; allows workflow enactors like Taverna.
e-infrastructure through sharing resources through the Grid.
Complication that IT has to catch up, so still need to develop both IT and scholarship at the same time.
Model impact of demographic change for service provision in cities.
Construct a synthetic population, with fully enumerated households; Demographic projections: dynamically model individual state transitions; look at particular application domains.
Components, MoSeS (Modelling and Simulation for e-Social Science): Data : Analysis : Computation : Visualisation : Collaboration. Aim to use secondary as well as primary data. User functionality through JSR-168 portlets. Moving to loose coupling with web services; allows workflow enactors like Taverna.
e-infrastructure through sharing resources through the Grid.
Complication that IT has to catch up, so still need to develop both IT and scholarship at the same time.
Kerry Kilner & Anna Gerber, UQ: Austlit
Transforming the Study of Australian Literature through a collaborative eResearch environment
Abstract
Austlit aims to be the central virtual research resource for Australian literature research & teaching. Has provided extensive biographical & bibliographical records, specialist data set creations. Was converting paper to web-based projects (Bibliography of Australian Literature); moves to process rather than product view of scholarship.
Supports research community building. Upcoming: Aus-e-Lit, deeper engagement with new forms of scholarly communication & publication.
Abstract
Austlit aims to be the central virtual research resource for Australian literature research & teaching. Has provided extensive biographical & bibliographical records, specialist data set creations. Was converting paper to web-based projects (Bibliography of Australian Literature); moves to process rather than product view of scholarship.
Supports research community building. Upcoming: Aus-e-Lit, deeper engagement with new forms of scholarly communication & publication.
- Federated search, visual reports (graphs, maps: New Empiricism). Allow intelligent metadata queries.
- Tagging & Annotation: collaborative (Scholarly editions; simple tagging)
- Compound Object Authoring tools (OAI ORE), for publishing as aggregates
- Data model: Literature Object Reuse & Exchange (based on FRBR)
- Data model: Literature Object Reuse & Exchange (based on FRBR)
Michael Fulford, University of Reading: From excavation to publication
From excavation to publication: the integration of developing digital technologies with a long-running archaeological project
Plenary session; Abstract
Archaeological project on Silchester has been running for 12 years. Complete town. Project involves management of large number of researchers, including undergrads; non-trivial logistical exercise. Stratographical complexity.
Integrated Archaeological Databased (IADB): used since start in 1997; has evolved with project. Contains most records gathered on site, incl. field records, records of context sheets. Aimed to provide integrated access to excavation records, in virtual research environment. Scope has broadened to include archival functions, project management, and now web publication.
Digital research essential: no pre-digital archaeological town studies have ever been properly published.
VERA: Virtual Environment for Research in Archaeology. JISC funded, has contributed to current excavations. Aims to enhance how data is documented; web portal, develop novel generic tools, and test them with archaeologists.
Have piloted digital pen for context notes, are now using throughout: 50% of notes. Speeds up post-excavation work by preventing transcribing cost. Had also experimented with iPAQs, tablet PCs (problem with sunlight), digimemo pad (not robust).
Capturing 2D plans. Have started trialling GPS.
Stratigraphy based on gathered data in IADB. Collaborative authoring. LEAP Project: linking electronic resources and publications, so can inspect data holdings supporting papers. Has not been easy in archaeology until now.
There is no one answer on how to use the tech: it must be driven by the research. Resourcing constrains what can be done.
Plenary session; Abstract
Archaeological project on Silchester has been running for 12 years. Complete town. Project involves management of large number of researchers, including undergrads; non-trivial logistical exercise. Stratographical complexity.
Integrated Archaeological Databased (IADB): used since start in 1997; has evolved with project. Contains most records gathered on site, incl. field records, records of context sheets. Aimed to provide integrated access to excavation records, in virtual research environment. Scope has broadened to include archival functions, project management, and now web publication.
Digital research essential: no pre-digital archaeological town studies have ever been properly published.
VERA: Virtual Environment for Research in Archaeology. JISC funded, has contributed to current excavations. Aims to enhance how data is documented; web portal, develop novel generic tools, and test them with archaeologists.
Have piloted digital pen for context notes, are now using throughout: 50% of notes. Speeds up post-excavation work by preventing transcribing cost. Had also experimented with iPAQs, tablet PCs (problem with sunlight), digimemo pad (not robust).
Capturing 2D plans. Have started trialling GPS.
Stratigraphy based on gathered data in IADB. Collaborative authoring. LEAP Project: linking electronic resources and publications, so can inspect data holdings supporting papers. Has not been easy in archaeology until now.
There is no one answer on how to use the tech: it must be driven by the research. Resourcing constrains what can be done.
Summaries from e-research Australasia
The next several posts capture sessions I attended during the e-research Australasia 2008 conference. I attended sessions Monday and Wednesday, and workshops Thursday and Friday. The notes are short jottings, and will probably not go any further than the inevitable Powerpoints when they are published.
Subscribe to:
Posts (Atom)