Data Management and Sharing Case Studies

Australia's Virtual Herbarium

Australia’s Virtual Herbarium (AVH) provides web-based access to a federation of Australia’s 6 million plant specimen records held in major herbaria across the country, and makes them available over the internet to researchers and the public. 

Researchers can select specimens by family, genus, species, the herbarium where they are stored, and when, where and by whom they were collected.

Information is provided about each specimen, including its biological name, associated species and ecological preferences. The specimen information can be provided on the web page either as a table or a map showing the location of the speciments, or it can be obtained as a file in one of a selection of formats.

To facilitate getting the data into the AVH database, administrators can upload data in a variety of industry standard formats.

The data deluge is a major challenge for researchers; how can data from different studies, different research groups, and in different formats be integrated and shared with collaborators and other researchers?

eResearch SA has been working with special interest groups to develop solutions to the data problem. Using advanced information communication technologies, we’re building web-based data repositories for controlled data sharing and public access.

The AVH is a pioneering effort with particular value in displaying information about the geographic distribution of species, enhanced by images, descriptive text and identification tools. This is transforming the use of data that stretches as far back as the earliest days of European settlement.

AusStage Visual Mapping

Jonathan Bollen, Flinders University
Nathan Lambert, eResearch SA Summer Scholarship Receipient
Bradley Williams, eResearch SA Summer Scholarship Receipient

eResearch SA offers summer scholarships to academic high performers with an interest in eResearch. In 2008, eResearch SA awarded scholarships to Nathan Lambert and Bradley Williams to visualise networks of artistic collaboration using AusStage, supervised by Jonathan Bollen, Flinders University.

AusStage is a relational database of performing arts events. It records the relationships between events, venues, organisations, people, and resources. Researchers know, anecdotally, that social networks operate in the field of performing arts. They also know that interactions between artists, as they train, rehearse and work together, have implications for the kinds of artists they become and the kinds of performances they make. AusStage records the history of these networks of contacts and collaborations. Mapping performing arts events lets us see where artists perform and how productions travel. Network visualisation lets us see how performing artists interact.

The scholarship students set out to analyse these networks by visualising data from AusStage in network graphs and geographic maps. Nathan developed new approaches to analysing data in AusStage by using network visualisation to address the question, ‘who works with whom?’ His project explored software applications, data migration, and techniques for visualising AusStage network data relating to the performing arts in South Australia.

Bradley developed methods for integrating AusStage and venue data with geo-coded data from sources such as the Australian Bureau of Statistics, so as to display the results on interactive maps using Google Maps. His tasks were to research how AusStage data and Google Earth could be used together to provide a visualised map of the data, and how this data could be used in an informative way.

Nathan was successful in creating network graphs that displayed how contributors linked other contributors together (in most cases the linking contributors were the directors, playwrights, or designers). His graphs also provided a visual display of the Australian Dance Theatre’s toured work, showing the company’s wide coverage both within Australia and internationally, links between contributors who had participated in an Adelaide Festival or Fringe event, and a graph of all contributors to a South Australian event between 1970 and 1980.

Bradley found that Google Earth was limited in that it displayed historical events on a contemporary map. He was able to overlay historical maps on Google Earth so that venue data from the late 1800s could be displayed on a map from the same time.

The results of both of these scholarships will feed into future developments in AusStage. Perhaps the most exciting thing about these scholarships is that the approaches to data visualisation the students developed were central in attracting $500,000 of National eResearch Architecture Taskforce (NeAT) funding to develop production services based on these prototypes.

Grass microarray database project

Joy Raison, eResearch SA
Dr. Ute Baumann, Australian Centre for Plant Functional Genomics (ACPFG)

Abiotic stresses, such as temperature, water logging, drought, salinity and mineral deficiencies or toxicities are a major cause of yield and quality loss in cereal crops.

To develop varieties with resistance or increased tolerance to such stresses, scientists at the Australian Centre for Plant Functional Genomics (ACPFG) are working to determine which genes are activated or repressed in different varieties under different stress conditions, and the function of these genes in the metabolism of the plants.

Microarray technology has been developed to simultaneously detect the expression levels of thousands of genes in a sample of biological material. Microarray experiments are performed to determine differential gene expression in multiple biological samples.

The biological material could come from any organism, at any developmental stage, be from any tissue of the organism, and have undergone any treatment or stress regime. The design of each experiment will determine the biological material to be used and any treatments that will be applied to it.

To support genomic research, the results of microarray experiments can now be lodged at different web sites and made available to the public.

To take advantage of the information that these publicly available microarray datasets may have in relation to cereals under different abiotic stresses, eResearch SA, in conjunction with the ACPFG, developed a database to store microarray experiment related data on any species in the Poaceae (grass) family.

The aim of the Grass Microarray Database project is to collect, manage, and provide access to ACPFG and public microarray experiment data and related information for grass species in a single format.

Biologists, bioinformaticians and others are expected to access the data. Interfaces will be developed for each of these use groups to access the information they require.

Statistical style analysis of motion pictures

Statistical style analysis of motion pictures offers a systematic and objective means to analyse a film’s shot structure and staging. It is research into the way films are put together, rather than how they are perceived or comprehended.

This research activity has been largely ignored until now because of the time-consuming and labour-intensive process involved. With the advent of appropriate digital technology it is now a practical mode of research, with considerable capacity to enhance stylistic analysis of cinema.

In 2009 eResearch offered summer scholarships to Mei Ling Yang and Arif Rezwani. They developed an automated data capture process for the statistical style analysis of motion pictures and a data set that includes descriptive information on the content of each shot, including dialogue, musical and sound features.

In developing the automated data capture process the students created Auto Cut Detector software (ACD). The ACD automatically detects shots and capture information for the statistical style analysis of motion pictures.

It reads a video file (eg. DVD); and then detects shots automatically. It grabs key frames of each shot from the video, gathers relevant data, and sends the frames and related information to a database.

The dataset that they built included

  • Film ID (a number uniquely identifying each film).
  • Film title (captured automatically from the film file).
  • Film duration (total length of the film).
  • Shot ID (a number uniquely identifying each shot).
  • Shot number (the total number of shots in the film).
  • Shot duration (the total time of each shot).
  • Shot image (a screen capture from the beginning of each shot).

Once the database has been populated automatically by the software, the researcher can rapidly add more descriptive information on each shot.

Analysis of this enriched database could then determine, for example, the frequency and relative total time spent in each type of shot, the correlation between shot scale and shot duration, and the sequential pattern of shots of different scales.

For the first time, this enables us to answer questions such as, what percentage of the total running time is occupied by close-ups of the female star?

Unlocking the secrets of evolution

Professor David Adelson, Chair of Bioinformatics and Computational Genetics at the University of Adelaide’s School of Molecular & Biomedical Science, is part of a significant international project mapping cow and horse genomes to understand and describe the entire complement of their genetic material.

Cow and horse are the first mammalian livestock animals in the world to be sequenced, and Dave is one of more than 300 researchers across 25 countries who have worked for six years to understand their genomes.

eResearch SA staff member Joy Raison collaborates with Dave, and assists him with the analysis of these genomes. Their work has significant applications for agricultural livestock production, and also yields insight into mammalian genome evolution.

Dave and Joy are specifically interested in what many scientists think of as ‘junk DNA,’ those sections of the genome that are not genes, and which have been thought to have no function. They have made some exciting discoveries in these sections of the genome that challenge this common pre-conception.

Dave and Joy are sent an animal’s whole genome. They then analyse the junk DNA looking for patterns that repeat. They analyse sequences varying between 1,000 and 150,000,000 bit pairs in length.

Once patterns have been identified within the sequences they are classified. These classified patterns can then be searched for and compared across species. These sequencing projects have revealed some exciting findings about the characteristics of conserved portions of ancient DNA patterns in the genome.
Dave and Joy’s analysis has shown that junk DNA is probably a misnomer, and that the conservation of these sequences of ancient DNA across all the animals studied is a key discovery in understanding the evolution of these animals. They have uncovered novel, functionally important aspects of genome structure.

This groundbreaking research has been published in Science, the world’s leading journal of original scientific research.