Award Abstract # 1928288
EarthCube Data Capabilities: Collaborative Research: Integration of Reproducibility into Community Cyberinfrastructure

NSF Org: RISE
Div of Res, Innovation, Synergies, & Edu
Recipient: DEPAUL UNIVERSITY
Initial Amendment Date: August 26, 2019
Latest Amendment Date: August 26, 2019
Award Number: 1928288
Award Instrument: Standard Grant
Program Manager: Eva Zanzerkia
ezanzerk@nsf.gov
 (703)292-4734
RISE
 Div of Res, Innovation, Synergies, & Edu
GEO
 Directorate For Geosciences
Start Date: September 1, 2019
End Date: August 31, 2024 (Estimated)
Total Intended Award Amount: $331,932.00
Total Awarded Amount to Date: $331,932.00
Funds Obligated to Date: FY 2019 = $331,932.00
History of Investigator:
  • Tanu Malik (Principal Investigator)
    tanu@cdm.depaul.edu
Recipient Sponsored Research Office: DePaul University
1 E JACKSON BLVD
CHICAGO
IL  US  60604-2287
(312)362-7388
Sponsor Congressional District: 07
Primary Place of Performance: DePaul University
243 S. Wabash Ave.
Chicago
IL  US  60604-2301
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): MNZ8KMRWTDB6
Parent UEI:
NSF Program(s): EarthCube
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433
Program Element Code(s): 807400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.050

ABSTRACT

For science to reliably support new discoveries, its results must be reproducible. This has proven to be a challenge in many fields including, most notably, fields that rely on computational studies as a means for supporting new discoveries. Reproducibility in these studies is particularly difficult because they require open sharing of data and models and careful control by the original researcher. This is to ensure that products can be run on later generations of hardware and software and produce consistent results. This project will develop software that helps support computational reproducibility and makes it easier and more efficient for geoscientists to preserve, share, repeat and replicate scientific computations. The Broader Impacts of this project include a collaboration between computer scientists, hydrologists and the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) for the hydrology research community. With over 3500 users, and holding over 8000 model and data resources, this collaboration will bring improved tools and best practices to a broad and diverse community of geoscientists. Beyond hydrology, the methods and tools developed as part of this project have the potential to be extended to the solid Earth and space science geoscience domains. They also have the potential to inform the reproducibility evaluation process as currently undertaken by journals and publishers. The projct will also conduct workshops to train researchers and be used in the classroom at Utah Sate Universtiy, DePaul University and the University of Virginia.

Emphasis on the importance of research reproducibility is steadily rising, however many studies still continue to not be reproducible. Reproducibility in computational studies is particularly difficult because of the challenges involved in completely documenting the data, models and procedures used together with the underlying hardware and software dependencies. The reproducibility workbench software (ReproBench) developed in this project will address reproducibility questions by establishing a container-based reproducible workflow that will make it easy and efficient for geoscientists to verify scientific results. Automation and documentation are two key methods for improving verification and, in general, the conduct of reproducible science. This project will build-from past investments: (I) automated containerization methods, through the Sciunit project, and (II) well-documented, community-adopted interfaces, through HydroShare, and bring these investments together to establish a novel, robust, and reproducible workflow. By applying this workflow to water-related science use cases, this project will demonstrate how to preserve, share, repeat, and replicate scientific results. The interfaces can become an exemplar for other community cyberinfrastructure that, akin to Hydrology, aims to share data and models at a large scale. In establishing this workflow, the ReproBench project team combines expertise in cyberinfrastructure, domain science, and reproducible computational data science. By leveraging Sciunit, ReproBench brings formal methods for the conduct of reproducible computational science into the geosciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Y. Nakamura, T. Malik "Efficient Provenance Alignment in Reproduced Executions" USENIX Theory and Practice of Provenance , 2020 Citation Details
Nakamura, Yuta and Malik, Tanu and Kanj, Iyad and Gehani, Ashish "Provenance-based Workflow Diagnostics Using Program Specification" 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) , 2022 https://doi.org/10.1109/HiPC56025.2022.00046 Citation Details
Choi, Young-Don and Roy, Binata and Nguyen, Jared and Ahmad, Raza and Maghami, Iman and Nassar, Ayman and Li, Zhiyu and Castronova, Anthony M. and Malik, Tanu and Wang, Shaowen and Goodall, Jonathan L. "Comparing containerization-based approaches for reproducible computational modeling of environmental systems" Environmental Modelling & Software , v.167 , 2023 https://doi.org/10.1016/j.envsoft.2023.105760 Citation Details
Naga Nithin Manne, Shilvi Satpati "CHEX: Multiversion Replay with Ordered Checkpoints." Proceedings of the Very Large Databases , v.15 , 2022 https://doi.org/10.14778/3514061.3514075 Citation Details
Malik, Tanu "Artifact Description/Artifact Evaluation: A Reproducibility Bane or a Boon" 4th International Workshop on Practical Reproducible Evaluation of Computer Systems , 2021 https://doi.org/10.1145/3456287.3465479 Citation Details
Ahmad, Raza and Manne, Naga Nithin and Malik, Tanu "Reproducible Notebook Containers using Application Virtualization" IEEE 18th International Conference on e-Science (e-Science) , 2022 https://doi.org/10.1109/eScience55777.2022.00015 Citation Details
Essawy, Bakinam T. and Goodall, Jonathan L. and Voce, Daniel and Morsy, Mohamed M. and Sadler, Jeffrey M. and Choi, Young Don and Tarboton, David G. and Malik, Tanu "A taxonomy for reproducible and replicable research in environmental modelling" Environmental Modelling & Software , 2020 https://doi.org/10.1016/j.envsoft.2020.104753 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page