Banner Text

Unleashing the most powerful supercomputer in the state


Introduction

CIF21 DIBBs: EI: Virtual Data Collaboratory

A Regional Cyberinfrastructure for Collaborative Data Intensive Science

The Virtual Data Collaboratory (VDC) is a federated data cyberinfrastructure that is designed to drive data-intensive, interdisciplinary and collaborative research, and enable data-driven science and engineering discoveries. VDC accomplishes this by providing seamless access to data and tools to researchers, educators, and entrepreneurs across a broad range of disciplines and scientific domains as well as institutional and geographic boundaries. In addition to enabling researchers to advance research frontiers across  multiple disciplines, VDC also focuses on (1) training the next generation of scientists with deep disciplinary expertise and a high degree of competence in leveraging data, cyberinfrastructure, and tools to address research problems and (2) helping data scientists and engineers develop and apply advanced federated data management and analysis tools for high impact scientific applications. To meet this mission, VDC extends beyond its collaborating institutions and leverages NSF investments to provide cyberinfrastructure typically not available to community colleges, state-associated colleges and universities, and regional liberal arts colleges and universities, and to stimulate intense user engagement and adoption by scientists across domains and institutions.

 

VDC represents state of the art data-intensive computing, storage, and networking solutions, integrated with an innovative data services layer. VDC is federated and coordinated across three geographically distributed Rutgers University campuses in New Jersey and multiple campuses in Pennsylvania and New York by a high-speed network, with the potential to incorporate academic/research institutions across the Mid-Atlantic and the nation. VDC builds on and integrates existing national/international and regional data repositories, including NSF-funded repositories, and leverages local/regional/national ACI investments. Central to the VDC vision are three infrastructural innovations, a regional science data science DMZ network that provides services to enable efficient and transparent access to data and computing capabilities, an expandable and scalable architecture for data-centric infrastructure federation, and a data services layer to support research workflows that utilize cutting-edge semantic web technologies, support interdisciplinary research, expand access, and increase the impact of data-science worldwide.


Overarching Goals

  • Provide seamless access to data and tools to researchers, educators, and entrepreneurs across a broad range of disciplines and scientific domains as well as institutional and geographic boundaries. tools to address research problems.
  • Enable data scientists and engineers develop and apply advanced federated data management and analysis tools for high impact scientific applications.
  • Train the next generation of scientists with deep disciplinary expertise and a high degree of competence in leveraging data, cyberinfrastructure, and tools to address research problems.


Proposed VDC Architecture

  • Regional science data DMZ network
  • Scalable data-centric infrastructure federation
  • Data services to support research .
Thumb 1


Driving Applications

  • Deciphering Sequence and Structural Correlates of Protein Nucleic Acid Interactions (H. Berman & V. Honavar)
  • High-Volume City Data Sharing and Processing for Smart, Resilient, and Sustainable Cities (J. Gong, RU; Z. Zhu, CUNY; X. Liang, University of Pittsburgh; M. Balduccini, Drexel University)
  • Ocean Observatories Initiative (I. Rodero, M. Parashar)


Education and Outreach

  • Incorporate VDC into research-based and general data science/analytics classes so that students can perform large, applied projects
    - Analytics/data science programs at RU, PSU, Drexel, and CUNY
  • Create a set of easy to use modules/online material that could beused for all courses
    - Data Management, Stewardship, Reproducibility, and Curation
  • Leverage NJBDA to impact analytics and data science courses across NJ
  • Foster learning communities using the online modules to enable peer-peer graduate learning through standard meet-up and chat software


List of Personnel

Rutgers University

  • Manish Parashar (Co-PI)  
  • Grace Agnew (Data Services Lead)  
  • Helen Berman  
  • Forough Ghahramani (Education Co-Lead)  
  • Charles Hedrick
  • Thu Nguyen (Education Co-Lead)  
  • Ivan Rodero (Systems Lead)

Penn State:

  • Vasant Honavar (Co-PI and Use Cases Lead)
  • Chuck Gilbert  
  • Wayne Figurelle  
  • Karen Estlund

NJEdge:

  • Edward Chapel (Policy and Governance Lead)

Kinber:

  • Wendy Huntoon (Network Lead)