2014 Events

RDI2/DCS Seminar: Disaggregated Systems

Speaker:  Hubertus Franke, IBM Research Division, Thomas J. Watson Research Center

Date:  Tuesday, December 2nd, 2014
Time:  11:00am
Location:  CoRE 301


Information-processing applications and domains require a rethinking of how IT is to be delivered as solutions and services, meeting a variety of requirements that include performance, resilience, security, and agility. Cloud-computing platforms are becoming one of the preferred deployment platforms for both traditional applications and a new generation of mobile, social, and analytics applications. Changes are evident at all layers of the platform stack. At the infrastructure level, the network, compute, and storage are becoming more software defined, i.e., the control layer and operational layers are separating, allowing the control to often be executed on standard servers and thus enabling a larger variety of offerings. For instance, software defined networking and software defined storage are currently undergoing this transition. As the underlying environment becomes more software defined and programmable, applications will become progressively easier to develop, deploy, optimize, and manage. This, in turn, will foster innovation in the solutions, tooling, and runtime domains, leading to entirely new ecosystems that will enable capturing workload topologies and characteristics, and that will enable compiling them onto an infrastructure that is assembled from the software defined components for optimal outcomes. The emergence of flat networks as the back bone of the data center allows this rethinking on how systems will be composed. Rather then assembling racks of servers, the infrastructure will be composed of resource pools ( storage, accelerators, memory nodes) that are interconnected through the high speed low latency flat networks. Network latencies and bandwidth approach bus attached I/O performance and therefore allow for resource disaggregation and subsequent re-aggregation based on the specific resource requirements of individual workloads. The programmable nature of this process enables a highly agile, flexible and consequently robust environment. In this talk, we will discuss the key ingredients of this disruptive trend of software defined environments and disaggregated system and illustrate their potential benefits. 

About Dr. Hubertus Franke

Hubertus Franke IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598 (frankeh@us.ibm.com). Dr. Franke is a Distinguished Research Staff Member and Senior Manager for Software Defined Infrastructures at the Thomas J. Watson Research Center. He received a Diplom degree in Computer Science from the Technical University of Karlsruhe, Germany, in 1987, and a M.S. and Ph.D. degrees in Electrical Engineering from Vanderbilt University in 1989 and 1992, respectively.  He subsequently joined IBM at the Thomas J. Watson Research Center, where he worked on the IBM SP1/2 messaging, scalable operating systems, multi-core architectures, scalable applications, and cloud platforms. He received several IBM Outstanding Innovation Awards for his work. He is an author or coauthor of more than 35 patents and over 110 technical papers. He is an IBM Master Inventor, a member of the IBM Academy of Technology, an Adjunct Professor for Computer Science at NYU teaching Operating Systems and he serves on the Rutger's ECE Advisory Board since 2011.


Computing at a Million of Mobiles Per Second

Speaker:  Dr. Scott B. Baden - Department of Computer Science and Engineering, University of California, San Diego

Date:  Tuesday, October 21st, 2014
Time:  11:00am
Location:  CoRE 301


Generally speaking, performance programming benefits from a prior knowledge about the application. The more we know about the problem we are solving, the more effectively we can modify application source to dramatically improve some aspect of performance. Due to tight system design constraints, this process is especially important at the exascale.

I will describe current work in my research group aimed at solving two performance programming problems. Our approach is to build a custom, domain-specific source-to-source translator that incorporates the knowledge of a performance programming expert.  The translators perform semantic level optimizations, which are unavailable to a traditional compiler working with conventional language constructs.

The first translator, Bamboo, transforms annotated MPI source into a data driven form that tolerates communication automatically. Running on up to 96K processors of a Cray XE-6, our translator meets or exceeds the performance of hand coded overlap variants.

The second translator, Mint, transforms annotated C++ stencil codes into highly optimized CUDA that comes close (80%) to the performance of carefully hand coded CUDA.

Domain specific translation is an effective means of managing development costs.  Both translators enable the domain scientist to remain focused on the domain science, while realizing performance
usually attributed to expert coders.

About Dr. Scott B. Baden

Dr. Scott B. Baden received his M.S and PhD in Computer Science from UC Berkeley in 1982 and 1987. He joined UCSD in 1990. He is a founding member of UCSD’s Computational Science, Mathematics, and Engineering Program (CSME). His research expertise includes high performance and scientific computation: run time support, domain specific translation, irregular problems and data discovery.


"Enabling Scalable Data Analysis of Large Computational Structural Biology Datasets on Distributed Memory Systems"

Speaker:  Dr. Michela Taufer, David L. and Beverly J.C. Mills Chair of Computer and Information Sciences and Associate Professor at the Univeristy of Delaware
Date:  Monday April 21, 2014
Time:  2pm
Location:  CoRE 701


Today, petascale distributed memory systems perform large-scale simulations and generate massive amounts of data in a distributed fashion at unprecedented rates. This massive amount of data presents new challenges for the scientists analyzing the data scientific meaning. In case of clustering of this data, traditional analysis methods may require the comparison of single records with each other in an iterative process, and therefore involve moving data across nodes of the system. When both the data and the number of nodes increase, clustering methods can increase pressure on the storage and the bandwidth of the system. Thus, the methods become inefficient and do not scale. New methodologies are needed to analyze data when it is distributed across nodes of large distributed memory systems.

In general, when analyzing structural biology datasets, we focus on specific properties of the data records such as the molecular geometry or the location of a molecule in a docking pocket. Based on this observation, in this talk we propose a methodology that allows the scalable analysis for large datasets composed of millions of individual structural biology records in a distributed manner on large distributed memory systems. The methodology is based on two general steps. The first step extracts concise properties or features of each data record and represents them as metadata in parallel. The second step performs the clustering on the extracted properties using machine-learning techniques.

We apply the methodology to two different computational structural biology datasets to identify geometrical features that can be used to (1) predict class memberships for structural biology datasets containing ligand conformations from protein-ligand docking simulations and (2) find recurrent folding patterns within and across trajectories (i.e., intra- and inter-trajectory respectively) in multiple trajectories sampled from folding simulations. Our results show that our approach enables scalable clustering analyses for large-scale computational structural biology datasets on large distributed memory systems. In addition, our method achieves better accuracy comparing to traditional analysis approaches.

About Dr. Michela Taufer

Michela Taufer is the David L. and Beverly J.C. Mills Chair of Computer and Information Sciences and an associate professor in the same department at the University of Delaware. She earned her master’s degrees in Computer Engineering from the University of Padova (Italy) and her doctoral degree in Computer Science from the Swiss Federal Institute of Technology (Switzerland). From 2003 to 2004 she was a La Jolla Interfaces in Science Training Program (LJIS) Postdoctoral Fellow at the University of California San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on interdisciplinary projects in computer systems and computational chemistry. From 2005 to 2007, she was an Assistant Professor at the Computer Science Department of the University of Texas at El Paso (UTEP). She joined the University of Delaware in 2007 as an Assistant Professor and was promoted to Associate Professor with tenure in 2012.

Taufer's research interests include scientific applications and their advanced programmability in heterogeneous computing (i.e., multi-core and many-core platforms, GPUs); performance analysis, modeling, and optimization of multi-scale applications on heterogeneous computing, cloud computing, and volunteer computing; numerical reproducibility and stability of large-scale simulations on multi-core platforms; big data analytics and MapReduce.


RDI2 Distinguished Seminar “An Unexpected Driver of Computational Complexity”

Speaker:  Dr. Paul Walker
Date:  Tuesday March 25, 2014
Time:  2pm
Location:  CoRE Auditorium
*Refreshments will be served*


As a leading investment bank, Goldman faces many challenging computational problems around latency, big data, risk management and infrastructure. There is another driver of computational complexity at the firm, though, and it is both surprising and pressing. In this talk, Paul will discuss some of the ways Goldman is addressing this factor, and will offer views on how our universities and research centers can prepare employees and leaders to deal with it.

About Dr. Paul Walker

Paul is co-head of the Technology Division at Goldman Sachs. He is a member of the Partnership Committee and the Firmwide Technology Risk Committee. Previously, Paul served as the global head of risk and strategy for Prime Services in the Securities Division and before that he was global head of the Core Strats team. Paul joined Goldman Sachs as a vice president on the FICC Strats team in 2001 and was named managing director in 2004 and partner in 2008.

Prior to joining the firm, Paul worked as a physics researcher at the Max-Planck-Institut für Gravitationsphysik and the National Center for Supercomputing Applications. He also worked in the Technology and eBusiness units of J.P. Morgan.

Paul is a member of the Board of Directors of the Depository Trust and Clearing Corporation (DTCC) and the Board of Governors of the New York Academy of Sciences.

Paul earned a PhD in Physics and an MSc in Physics from the University of Illinois at Urbana-Champaign in 1998 and 1996, respectively, and a BA in Physics from Cornell University in 1993.


RDI2 Distinguished Seminar "Anton: A Special-Purpose Machine that Achieves a Hundred-Fold Speedup in Biomolecular Simulations"

Speaker:  Dr. David E. Shaw
Date:  Tuesday February 25, 2014
Time:  2pm
Location:  CoRE Auditorium
*Refreshments will be served*


Molecular dynamics (MD) simulation has long been recognized as a potentially transformative tool for understanding the behavior of proteins and other biological macromolecules, and for developing a new generation of precisely targeted drugs.  Many biologically important phenomena, however, occur over timescales that have previously fallen far outside the reach of MD technology. 

We have constructed a specialized, massively parallel machine, called Anton, that is capable of performing atomic-level simulations of proteins at a speed roughly two orders of magnitude beyond that of the previous state of the art.  The machine has now simulated the behavior of a number of proteins for periods as long as two milliseconds -- approximately 200 times the length of the longest such simulation previously published -- revealing aspects of protein dynamics that were previously inaccessible to both computational and experimental study. 

The speed at which Anton performs these simulations is in large part the result of a tightly coupled codesign process in which the machine architecture was developed in concert with novel algorithms, including an asymptotically optimal parallel algorithm (with highly attractive constant factors) for the range-limited N-body problem.

About Dr. David E. Shaw 

David E. Shaw serves as chief scientist of D. E. Shaw Research and as a senior research fellow at the Center for Computational Biology and Bioinformatics at Columbia University.  He received his Ph.D. from Stanford University in 1980, served on the faculty of the Computer Science Department at Columbia until 1986, and founded the D. E. Shaw group in 1988.  Since 2001, Dr. Shaw has been involved in hands-on research in the field of computational biochemistry.  His lab is currently involved in the development of new algorithms and machine architectures for high-speed molecular dynamics simulations of biological macromolecules, and in the application of such simulations to basic scientific research and computer-assisted drug design. 

Dr. Shaw was appointed to the President’s Council of Advisors on Science and Technology by President Clinton in 1994, and again by President Obama in 2009.  He is a member of the National Academy of Engineering, and is a fellow of the American Academy of Arts and Sciences and of the American Association for the Advancement of Science.