The tools that give us the power to achieve
Research Infrastructure at RDI² includes a 32 node Dell M610 blade cluster, consisting of two Dell M1000E Modular Blade Enclosures, necessary interconnect/management infrastructure, and a supervisory node. Each enclosure is maximally configured with sixteen blades, each blade having two Intel Xeon E5504 Nehalem family quad-core processors at 2.0 GHz, forming an eight core node. Each node has 24 GB RAM and 73 GB of local disk storage (10,000 RPM), twelve nodes have an additional 1 TB of local storage.
The network infrastructure is comprised of an integrated 16-port Mellanox InfiniBand switch within each blade chassis, each switch linked to the switch in the other chassis. All blades have Mellanox Quad-Data-Rate (QDR) InfiniBand interface cards. There is also an integrated (redundant) 1 Gigabit Ethernet within each chassis, with two pairs of 10 Gigabit uplink capabilities in each chassis. In the aggregate, the cluster system consists of 32 nodes, 256 cores, 768 GB memory and ~14.5 TB disk capacity, with a 20 Gigabit InfiniBand network and two 1Gigabit Ethernet networks.
CAPER (Computational and Data Platform for Energy efficiency Research) is a unique and flexible instrument funded by The National Science Foundation that combines high performance Intel Xeon processors with a complete deep memory hierarchy, latest generation co-processors, high performance network interconnect and powerful system power instrumentation.
It is currently an eight-node cluster, which is capable of housing concurrently, in one node up to eight general-purpose graphical processing units (GPGPU), or eight Intel many- integrated-core (MIC) coprocessors - or any eight-card combination of the two; and up to 48 hard disk drives (HDD), or solid-state drives (SSD). A single-node configuration features a theoretical peak performance of 20TF/s (single precisions) or 10TF/s (double precision) and supports up to 32TB of storage and 24TB of flash-based non-volatile memory. A separate node serves as a login and storage server.
This hardware configuration is unprecedented in its flexibility and adaptability as it can combine multiple components into a smaller set of nodes to reproduce specific configurations. This platform also mirrors key architectural characteristics of high-end sys- tem, such as XSEDE's Stampede system at TACC, and provides several unique features to support critical research goals such as software/hardware co-design. CAPER provides a platform to validate models and investigate important aspects of data-centric and energy efficiency research.
Enabled Research Areas
Application-Aware Cross-Layer Power Management
Language extensions and supporting runtimes that can facilitate adaptations and optimization of power/performance. Energy/power-efficiency tradeoffs in a holistic manner in combination with performance, resilience, quality of solution, and other objectives.
Energy/Performance Tradeoffs of Data-Centric Workflows
Polices and mechanisms for runtime placement and management of data as well as data processing. Strategies that can fundamentally enable data-intensive workflows on both current and future large-scale systems.
Software/Hardware Co-Design for In-Site Data Analytics
Co-design for end-to-end data-intensive simulations workflows that include coupled simulations as well as analytics components. Explore the rich design spaces available for the placement and scheduling of computation and data in space and time; across local, remote and accelerator cores; and at various levels of the deep memory hierarchy; all while considering performance and energy/power constraints and associated tradeoffs.
Thermal Implications of Proactive Virtual Machine Management
Design and validation of a proactive information centric and memory-hierarchy-aware approach to virtual machine (VM) consolidation and management for virtualized datacenters. Thermal implications of incorporating memory-hierarchy awareness in VM consolidation as well as the impact on application performance.
Unique System Configuration
The Powerhouse of Our Work
What Powers Our Servers?
The operational research platform consists on an eight-node cluster, each server consisting of an Intel Xeon multi-core coprocessor, an Intel Many In-tegrated Core (MIC) co-processor (codenamed "Knights Corner"), and configured with a deep memory hierarchy.
More specifically, the cluster is based on SuperMicro SYS-4027GR-TRT system, which is capable of housing concurrently, in one node:
- Up to 8 general-purpose graphical processing units (GPGPU), or eight Intel Many-Integrated-Core (MIC) coprocessors - or any eight-card combination of the two
- Two Fusion-io IoDrive-2 non-volatile random access memory (NVRAM) storage cards, or two LSI 24-port disk controllers - or any two-card combination of the two
- A single-port fourteen-data-rate InfiniBand networking card (FDR-14, 56 Gbps)
- Up to 48 Solid State Disk Drives (SSD) or Hard Disk Drives (HDD)
What's In CAPER?
CAPER's nominal configuration features servers with two Intel Xeon Ivy Bridge E5-2650v2 (16 cores/node) and the following subsystems:
- Deep Memory Hierarchy
- 128 GB of DRAM (expandable to 384GB)
- 1 TB of PCIe Flash-based NVRAM (Fusion-IO IoDrive2)
- 2 TB of Solid State Drive (SSD)
- 4 TB of spinning-platter hard disk (HDD)
- Intel Many-Integrated-Core (MIC) Xeon Phi 7120P (16 GB, 61 cores / 244 threads)
- Mellanox FDR InfiniBand (56 Gbps)
- 10 Gigabit Ethernet
The cluster, in whole or in part, may be switched between Scientific Linux 6.x (for Red Hat Enter-prise Linux compatibility) or Ubuntu 14.04 LTS (for Linux kernel version >= 3.14 research) with on-ly a reboot of the affected nodes. Software deployed on the cluster (both operating system profiles) include openmpi, Matlab (R2014a, R2013a/b, R2012a/b), Mathematica (v10, v9, v8.0.4), Maple (v18, v17, v16).
CAPER is outfitted with comprehensive and non-invasive instrumentation. It sup-ports both coarse- and fine-grained power metering at a per-server level. On the one hand, an instru-mented Raritan iPDU PX2-4527X2U-K2 provides power measurements at 1 Hz. On the other hand, a Yokogawa DL850E ScopeCorder provides voltage and current measurement for all nodes through 1 Ms/s modules. An external collector aggregates the voltage/current measurements from the DL850E acquisition system and integrates them over time to obtain power measurements at a sampling rate of up to 10 KHz.