The official website of HyLoC lossy compression framework project.
Project Title: CDS&E: HyLoC: Objective-driven Adaptive Hybrid Lossy Compression Framework for Extreme-Scale Scientific Applications
Principle Investigators: Dingwen Tao, Sheng Di
Award Numbers: 2042084, 2003709
Project Duration: 8/1/2020 - 7/31/2023
Today’s extreme-scale scientific simulations and instruments are producing huge amounts of data that cannot be transmitted or stored effectively. Lossy compression, a data compression approach leading to certain data distortion, has been considered as a promising solution, because it can significantly reduce the data size while maintaining high data fidelity. However, the existing lossy compression methods may not always work effectively on all datasets used in specific applications because of their distinct and diverse characteristics. Moreover, the user objectives in compression quality and performance may vary with applications, datasets or circumstances. This project aims to develop a hybrid lossy compression framework to automatically construct the best-fit compression for diverse user objectives in data-intensive scientific research. Educational and engagement activities are provided to develop new curriculum related to scientific data compression and promote research collaborations with national laboratories.
Designing an efficient, adaptive, hybrid framework that can always choose the best-fit compression strategy is nontrivial, since existing state-of-the-art lossy compression methods are developed with distinct principles. The project has a three-stage research plan. First, the project decouples the state-of-the-art error-bounded lossy compression approaches into multiple stages and effectively models the working efficiency (e.g., compression ratio, error, speed) of particular approaches in each stage. Second, the project develops a loosely-coupled framework to aggregate the decoupled compression stages together and also explores as many compression pipelines composed of different stages as possible, to optimize the classic compression efficiency, including compression quality and performance. Third, the project optimizes the synthetic data-movement performance regarding the external devices and resources, such as I/O performance. The team evaluates the proposed framework on multiple extreme-scale scientific applications, including cosmological simulations, light source instrument data analytics, quantum circuit simulations, and climate simulations. The project may create technologies that can increase the storage availability and improve the performance for extreme-scale scientific applications, opening opportunities for new discoveries.
Dingwen Tao (Lead PI), Washington State University
Dingwen Tao is an associate professor in the Luddy School of Informatics, Computing, and Engineering at Indiana University Bloomington. He has published in the top-tier HPC and big data conferences and journals, including SC, ICS, HPDC, PPoPP, PACT, IPDPS, CLUSTER, DAC, BigData, ICPP, MSST, TPDS, TC, JPDC, IJHPCA, etc. He is the recipient of the R&D100 Awards Winner (2021), IEEE Computer Society TCHPC Early Career Researchers Award for Excellence in High Performance Computing (2020), NSF CISE Research Initiation Initiative (CRII) Award (2020), IEEE CLUSTER Best Paper Award (2018), and UCR Dissertation Year Program (DYP) Award (2017).
Sheng Di (Site PI), University of Chicago and Argonne National Laboratory
Sheng Di is a computer scientist at Argonne National Laboratory, USA. He is an IEEE senior member. He is a scientist at Large through the Consortium for Advanced Science and Engineering (CASE) at the University of Chicago. He is an institute fellow of Northwestern-Argonne Institute of Science and Engineering (NAISE). He has published 100+ refereed journal and conference papers, including TPDS, TC, TCC, TKDE, JPDC, IJHPCA, PPoPP, SC, IPDPS, HPDC, PACT, MSST, DSN, ICPP, CLUSTER, BigData, IWQoS, CCGrid, HiPC, Grid, CLOUD, UCC, EuroPar, ICPADS, Europar, etc. He is the receipient of DOE 2021 Early Career Research Program Award Winner, 2018 IEEE-Chicago Distinguished Mentoring Award, and 2019 IEEE-Chicago Distinguished R&D Award.
Ian Foster, University of Chicago
Online Data Analysis and Reduction
Kyle Chard, University of Chicago
Online Data Analysis and Reduction
Fred Chong, University of Chicago
Quantum Computing Simulations
Junjing Deng, Argonne National Laboratory
X-ray Imaging for Crystallography Analyses
Zarija Lukic, Lawrence Berkeley National Laboratory
Cosmological Simulations
Sian Jin, Washington State University
Daoce Wang, Washington State University
Yuanjian Liu, University of Chicago
2022 NSF CSSI PI Meeting for “Towards a Sustainable Data and Software Cyberinfrastructure” at Alexandria, VA.
The 2nd International Workshop on Big Data Reduction held with 2021 IEEE Big Data conference.
The 1st International Workshop on Big Data Reduction held with 2020 IEEE Big Data conference.
This material is based upon work supported by the National Science Foundation under Grants No. 2042084 and 2003709. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.