Award Abstract # 1148698
THE OPEN SCIENCE GRID The Next Five Years: Distributed High Throughput Computing for the Nation's Scientists, Researchers, Educators, and Students

NSF Org: PHY
Division Of Physics
Recipient: UNIVERSITY OF WISCONSIN SYSTEM
Initial Amendment Date: May 16, 2012
Latest Amendment Date: April 20, 2021
Award Number: 1148698
Award Instrument: Cooperative Agreement
Program Manager: Bogdan Mihaila
bmihaila@nsf.gov
 (703)292-8235
PHY
 Division Of Physics
MPS
 Direct For Mathematical & Physical Scien
Start Date: June 1, 2012
End Date: May 31, 2022 (Estimated)
Total Intended Award Amount: $18,750,000.00
Total Awarded Amount to Date: $24,378,518.00
Funds Obligated to Date: FY 2012 = $3,750,000.00
FY 2013 = $3,750,000.00

FY 2014 = $3,750,000.00

FY 2015 = $4,750,000.00

FY 2016 = $2,750,000.00

FY 2017 = $3,649,980.00

FY 2018 = $1,000,000.00

FY 2019 = $978,538.00
History of Investigator:
  • Miron Livny (Principal Investigator)
    miron@cs.wisc.edu
  • Frank Wuerthwein (Co-Principal Investigator)
  • Ruth Pordes (Former Co-Principal Investigator)
  • Michael Ernst (Former Co-Principal Investigator)
Recipient Sponsored Research Office: University of Wisconsin-Madison
21 N PARK ST STE 6301
MADISON
WI  US  53715-1218
(608)262-3822
Sponsor Congressional District: 02
Primary Place of Performance: University of Wisconsin-Madison
1210 West Dayton Street
Madison
WI  US  53706-1613
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): LCLSJAGTNZQ7
Parent UEI:
NSF Program(s): CYBERINFRASTRUCTURE,
COMPUTATIONAL PHYSICS,
PHYSICS GRID COMPUTING,
XD-Extreme Digital,
PHYSICS AT THE INFO FRONTIER,
Cybersecurity Innovation,
Campus Cyberinfrastructure
Primary Program Source: 01001213DB NSF RESEARCH & RELATED ACTIVIT
01001314DB NSF RESEARCH & RELATED ACTIVIT

01001415DB NSF RESEARCH & RELATED ACTIVIT

01001516DB NSF RESEARCH & RELATED ACTIVIT

01001617DB NSF RESEARCH & RELATED ACTIVIT

01001718DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT

01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 7569, 8084
Program Element Code(s): 723100, 724400, 724500, 747600, 755300, 802700, 808000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

The basic idea of Grid Computing is to utilize available CPU cycles and storage of many computer systems across a worldwide network so that they can function as a flexible, pervasive, and inexpensive accessible pool that could be harnessed by an individual, accredited user, similar to the way power companies and their users share the electrical grid. Grid computing can be viewed as a service for sharing computer power and data storage capacity over the Internet, simply and transparently, without having to consider where the computational facilities are located.

Experiments at major centralized experimental facilities such as the Large Hadron Collider (LHC) require large amounts of computation and storage and involve hundreds of experimenters using computational facilities all over the world. These features are well suited to the capabilities of grid computing. Grid computing developments occurred in parallel to the development of the LHC experiments. The Open Science Grid (OSG) is the major facilitator of Grid Computing in the U.S. Researchers subsequently developed these ideas in many other exciting ways, producing for example, in addition to OSG, large-scale federated systems (TeraGrid, EGEE, Earth System Grid) that provide not just computing power, but also data and software on demand. Standards organizations then developed relevant standards that led to possible interoperability of Grids. Grids define and provide a set of standard protocols, middleware, toolkits, and services built on top of these protocols. Interoperability and security are the primary concerns for the Grid infrastructure as resources may come from different administrative domains, which have both global and local usage policies, different hardware and software configurations and platforms, and vary in availability and capacity.

The Open Science Grid is a distributed computing infrastructure for large-scale scientific research. The OSG contributes to the Worldwide LHC Computing Grid as the shared distributed computing facility used by the US ATLAS and US CMS experiments. The OSG is built and operated by a consortium of 90 U.S. universities, national laboratories, scientific collaborations and software developers. It is supported by the National Science Foundation and the US Department of Energy Office of Science. The OSG supports not only physics experiments but also researchers from other fields, including astrophysics, bioinformatics and computer science. Currently the OSG has more than 60 sites in the US and five sites in Brazil, Taiwan and Mexico, supported by the host countries.

All LHC computing and storage sites in the US are members of the OSG and allow other scientific collaborations using the OSG to opportunistically use available resources. The OSG collaborates with the Enabling Grids for E-sciencE project in Europe to provide interoperating federated infrastructures which can be used transparently by the LHC experiments' software. The Large Hadron Collider, located 330 feet below the border of Switzer¬land and France, is the world's most powerful particle accelerator. Its very-high-energy particle collisions may yield extraordinary discoveries about the nature of the physical universe. Beyond revealing a new world of unknown particles, the LHC experiments could explain why those particles exist and behave as they do. The LHC experiments could uncover the origins of mass, shed light on dark matter, expose hidden symmetries of the universe, and possibly find extra dimensions of space.

The LHC accelerates hair-thin beams of particles to a whisker below the speed of light. Thousands of powerful superconducting magnets steer the beams around the LHC's 16.5-mile-long ring. At four points the particles collide in the hearts of the main experiments, known by their acronyms: ALICE, ATLAS, CMS and LHCb. In the data from these high-energy collisions scientists search for the tracks of particles whose existence could transform our understanding of the universe. More than 10,000 scientists, engineers and students from almost 60 nations on six continents contribute to the LHC, which is headquartered at the CERN laboratory in Geneva, Switzerland. About 1,700 come from universities and laboratories in the United States. Federal funding for US contributions to the LHC is provided by the US Department of Energy's Office of Science and the National Science Foundation.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fajardo, Edgar and Tadel, Matevz and Balcas, Justas and Tadel, Alja and Würthwein, Frank and Davila, Diego and Guiang, Jonathan and Sfiligoi, Igor "Moving the California distributed CMS XCache from bare metal into containers using Kubernetes" EPJ Web of Conferences , v.245 , 2020 https://doi.org/10.1051/epjconf/202024504042 Citation Details
Fajardo, Edgar and Arora, Aashay and Davila, Diego and Gao, Richard and Würthwein, Frank and Bockelman, Brian "Systematic benchmarking of HTTPS third party copy on 100Gbps links using XRootD" EPJ Web of Conferences , v.251 , 2021 https://doi.org/10.1051/epjconf/202125102001 Citation Details
Fajardo, Edgar and Weitzel, Derek and Rynge, Mats and Zvada, Marian and Hicks, John and Selmeci, Mat and Lin, Brian and Paschos, Pascal and Bockelman, Brian and Hanushevsky, Andrew and Würthwein, Frank and Sfiligoi, Igor "Creating a content delivery network for general science on the internet backbone using XCaches" EPJ Web of Conferences , v.245 , 2020 https://doi.org/10.1051/epjconf/202024504041 Citation Details
Sfiligoi, Igor and Schultz, David and Riedel, Benedikt and Wuerthwein, Frank and Barnet, Steve and Brik, Vladimir "Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing: Producing a fp32 ExaFLOP hour worth of IceCube simulation data in a single workday" PEARC '20: Practice and Experience in Advanced Research Computing , 2020 https://doi.org/10.1145/3311790.3396625 Citation Details
Copps, Elizabeth and Zhang, Huiyi and Sim, Alex and Wu, Kesheng and Monga, Inder and Guok, Chin and Würthwein, Frank and Davila, Diego and Fajardo, Edgar "Analyzing Scientific Data Sharing Patterns for In-network Data Caching" SNTA '21: Proceedings of the 2021 on Systems and Network Telemetry and Analytics , 2021 https://doi.org/10.1145/3452411.3464441 Citation Details
Sfiligoi, Igor "Demonstrating 100 Gbps in and out of the public Clouds" Practice and Experience in Advanced Research Computing (PEARC20) , 2020 https://doi.org/10.1145/3311790.3399612 Citation Details
Sfiligoi, Igor and Schultz, David and Wurthwein, Frank and Riedel, Benedikt and Deelman, Ewa "Pushing the Cloud Limits in Support of IceCube Science" IEEE Internet Computing , v.25 , 2021 https://doi.org/10.1109/MIC.2020.3045209 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

 

The "OSG-N5Y" project advanced the state of the art of distributed High Throughput Computing through the design, deployment, and operation of a national fabric of services. These services, operated by the OSG Consortium, allowed researchers with High Throughput workloads to effectively harness computing capacity located at US universities, federally funded laboratories, and international organizations.

High Throughput Computing (HTC) is an approach to empowering researchers with computing capacity via running large numbers of independent batch jobs. Each "job" is typically a single computer program in a particular configuration, provided with input data and producing output data. The philosophy behind HTC is, given a collection of computers, to maximize the number of outputs generated over time. Then distributed HTC (dHTC), which the OSG-N5Y project specialized in, applies the HTC principles over a large number of independent sources of computing capacity. By advancing the state of the art of dHTC, the OSG-N5Y project enabled researchers from a broad set of science domains to advance their science; by leveraging the OSG services, researchers were able to execute billions of jobs that utilized over 7 billion hours of computing capacity provided by organizations around the nation.

The OSG-N5Y project provided effort to the OSG Consortium, which is governed by a Council, managed by an Executive Director, and has a Technical Director for long-term technology evolutions. The Council is responsible for governance and seting the directions of the Consortium. OSG-N5Y's contributions drove the Consortium for nearly a decade (during early years, providing the majority of the staffing and support for the Technical Director) and put it on a more sustainable footing; by the end of the OSG-N5Y project, multiple collaborating projects joined forces to contribute effort to sustain the Consortium's activities.

The OSG-N5Y project provided services that can broadly be grouped into three categories:

  • Network-facing services: These are persistent processes, operated by OSG-N5Y staff, that are exposed to the Internet and provide functionality to authorized entities. A simple use case might be a web server that serves a website; a typical OSG-N5Y network-facing service would be the "Compute Entrypoint (CE) Collector", which acts as a phonebook or directory service for CEs contributing capacity to the OSG fabric of services.

  • Software services: The OSG-N5Y project curated software and produced the OSG Software Stack, which is an integrated, tested, documented, and supported set of externally produced computer software applications used by organzations that form  the OSG Consortium to operate their own network-facing services.

  • Intellectual engagement: The OSG facilitation team is devoted to empower users to harness the potential of dHTC to advance scientific discovery. This dedicated effort results in a better-trained software & engineering community in the US, ready to leverage distributed resources for their research challenges. The Technology Investigations team collaborated with external software providers to mature and evolve technologies to meet the requirements for functionality and dependability of the OSG Software Stack.

Each of the service types has resulted in a long-lasting outcome, whether it's the network-facing services the OSG Consortium continues to serve the broad S&W community, the OSG Software Stack used by science communities worldwide, or the improvements to users' and organizations' computational prowess from the engagements with researchers.

Specific services developed by the OSG-N5Y project and remaining in use as outcomes today include:

  • Open Science Compute Federation (OSCF): A set of network-facing services which enable independent sites - typically universities or laboratories - to join their local computers to large "resource pools". These resource pools could then execute researchers' computing workloads at a global scale.

  • Open Science Data Federation (OSDF): A set of services which could export scientific data to the Internet and a distribution layer, located at strategic network points around the US, that helps scale the delivery of files to resources in the OSCF.

  • Open Science Pool (OSPool): A resource pool, composed of capacity donated by campuses in the US, open to use by any federally funded researcher and their collaborators.

  • OSG-Connect: A service providing researchers with a location to place their high throughput workloads. OSG-Connect manages the workloads as ensebles of batch jobs which are executed on the OSPool.

Finally, in terms of broader impact, the project offered the annual OSG User School at the University of Wisconsin-Madison. The OSG User School is a week-long training event that covers the basic concepts of HTC and dHTC; typically, 40-60 students (a mixture of undergraduates, graduates, and cyberinfrastructure staff) attended each event. As a result, over the decade of the project, hundreds of students received in-depth training on advanced computing topics contributing to the development of the nation's workforce.

Through its sustained services to researchers engaged in Open Science, advancement of the OSG Consortium, and facilitation of new compute intensive science, the OSG-N5Y project has a long-lasting impact on the nation's advanced cyberinfrastructure and thus on the national research capability.

 


Last Modified: 09/30/2022
Modified by: Miron Livny

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page