Skip to content
Annotation and Assessment
    Contents and Rights
      Discovery and Access
        Data Collection, Monitoring and Quality Assurance
          Storage and Interoperability
           

          SI7: Data pre-processing system for the secondary storage

          SI7 Report [PDF 1,379Kb]

          schematic

          Objectives

          The aim of this workpackage is to establish a cost-effective data pre-processing system. This system will be used for refining, integrating and storing synchronous and asynchronous data streams from instruments and sensors into the secondary storage for later use in research. This workpackage focuses primarily on the large scale datasets from Protein crystallography and Climate modelling research group. These datasets require computationally intensive pre-processing. This workpackage provides the following services:

          • Data processing service
          • Data security service
          • Data transfer service
          • Data archiving service
          • Data compression service
          • Data replication service

          Descriptions

          This project would initially develop an asynchronous data (data sourced from CD/DVD media and MonashSunGrid) pre-processing service and later this service will be extended for synchronous data (data sourced from real time instruments & sensors) if time permits.

          Basic Flowchart

          basic flowchart

          Case 1: Protein Crystallography Data Processing

          The primary aim of this research group is to understand the role of proteins in biology and disease, by knowing their atomic structures using X-ray Crystallographythat requires massive computational power. Some sample outputs from processed Protein Crystallography data is shown below (Source: "The Critical Role of Computer Power in Structural Biology" presentation by Ashley Buckle):
          protien workflow

          Case 2: Regional Climate Modelling Data Processing

          The primary aims of this research group are to:
          • Simulate complete climate model for consistency and efficiency that requires massive computational power
          • Simulate for the timespan 2000 to 2005 that produces about 250 GB storage per experiment
          • Control various scenario experiments including real-world setups that requires about 1.5 TB of storage

          Some sample outputs from processed climate model data are shown below (Source: "The impact of abrupt land cover changes by savanna fire on contemporary north Australian climate" presentation by K. Grgen, A. Lynch, C. Enticott, J. Beringer, D. Abramson, P. Uotila, A. Marshall, N. Tapper):
          climate radar

          radar image