Objective-driven Adaptive Hybrid Lossy Compression Framework

The official website of HyLoC lossy compression framework project.

NSF-logo

Project Title: CDS&E: Collaborative Research: HyLoC: Objective-driven Adaptive Hybrid Lossy Compression Framework for Extreme-Scale Scientific Applications

Principle Investigators: Dingwen Tao, Sheng Di

Award Numbers: 2042084, 2003709

Project Duration: 8/1/2020 - 7/31/2023

Abstract

Today’s extreme-scale scientific simulations and instruments are producing huge amounts of data that cannot be transmitted or stored effectively. Lossy compression, a data compression approach leading to certain data distortion, has been considered as a promising solution, because it can significantly reduce the data size while maintaining high data fidelity. However, the existing lossy compression methods may not always work effectively on all datasets used in specific applications because of their distinct and diverse characteristics. Moreover, the user objectives in compression quality and performance may vary with applications, datasets or circumstances. This project aims to develop a hybrid lossy compression framework to automatically construct the best-fit compression for diverse user objectives in data-intensive scientific research. Educational and engagement activities are provided to develop new curriculum related to scientific data compression and promote research collaborations with national laboratories.

Designing an efficient, adaptive, hybrid framework that can always choose the best-fit compression strategy is nontrivial, since existing state-of-the-art lossy compression methods are developed with distinct principles. The project has a three-stage research plan. First, the project decouples the state-of-the-art error-bounded lossy compression approaches into multiple stages and effectively models the working efficiency (e.g., compression ratio, error, speed) of particular approaches in each stage. Second, the project develops a loosely-coupled framework to aggregate the decoupled compression stages together and also explores as many compression pipelines composed of different stages as possible, to optimize the classic compression efficiency, including compression quality and performance. Third, the project optimizes the synthetic data-movement performance regarding the external devices and resources, such as I/O performance. The team evaluates the proposed framework on multiple extreme-scale scientific applications, including cosmological simulations, light source instrument data analytics, quantum circuit simulations, and climate simulations. The project may create technologies that can increase the storage availability and improve the performance for extreme-scale scientific applications, opening opportunities for new discoveries.

Team

Principal Investigators

dingwen-photo

Dingwen Tao (Lead PI), Washington State University

Dingwen Tao is an assistant professor in the School of Electrical Engineering & Computer Science at Washington State University. He is also adjunct professor in the Department of Computer Science at the University of Alabama. He has published in the top-tier HPC and big data conferences and journals, including IEEE/ACM SC, ACM ICS, ACM HPDC, ACM PPoPP, ACM PACT, IEEE IPDPS, IEEE Cluster, IEEE/ACM DAC, IEEE BigData, ICPP, IEEE TPDS, etc. He is the receipt of the 2020 IEEE Computer Society TCHPC Early Career Researchers Award for Excellence in High Performance Computing, 2020 NSF CRII Award, and 2017 UCR Dissertation Year Program Award.

sheng-photo

Sheng Di, University of Chicago and Argonne National Laboratory

Sheng Di is a computer scientist at Argonne National Laboratory, USA. He is an IEEE senior member. He is a scientist at Large through the Consortium for Advanced Science and Engineering (CASE) at the University of Chicago. He is an institute fellow of Northwestern-Argonne Institute of Science and Engineering (NAISE). He has published 100+ refereed journal and conference papers, including TPDS, TC, TCC, TKDE, JPDC, IJHPCA, PPoPP, SC, IPDPS, HPDC, PACT, MSST, DSN, ICPP, CLUSTER, BigData, IWQoS, CCGrid, HiPC, Grid, CLOUD, UCC, EuroPar, ICPADS, Europar, etc. He is the receipient of 2018 IEEE Chicago Section Distinguished Mentoring Award and 2019 IEEE Chicago Section Distinguished Research and Development Award.

Collaborators

Ian-photo

Ian Foster, University of Chicago

Online Data Analysis and Reduction

kyle-photo

Kyle, Chard, University of Chicago

Online Data Analysis and Reduction

fred-photo

Fred Chong, University of Chicago

Quantum Computing Simulations

junjing-photo

Junjing Deng, Argonne National Laboratory

X-ray Imaging for Crystallography Analyses

zarija-photo

Zarija Lukic, Lawrence Berkeley National Laboratory

Cosmological Simulations

Students

daoce-photo

Daoce Wang, Washington State University

yuanjian-photo

Yuanjian Liu, University of Chicago

Publications

Under Construction

Software

Under Construction

Outreach

NSF-logo

The 1st International Workshop on Big Data Reduction held with 2020 IEEE Big Data conference

Acknowledgement & Disclaimer

This material is based upon work supported by the National Science Foundation under Grant Nos. 2042084 and 2003709. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.