NVIDIA® Tesla™ S1070 1U Computing System

The NVIDIA® Tesla™ S1070 1U Computing System seamlessly fits into enterprise server clusters and scales to solve the most complex computing problems. With four 1-Teraflop processors, the S1070 delivers up to 4 teraflops of performance in a 1U rack-mount system for unmatched performance in high density rack systems.

 
Bookmark and Share

Overview

High Performance Petascale Computing Systems
NVIDIA® Tesla™ S1070 computing system delivers the world’s first teraflop processor, combining breakthrough performance with energy efficiency. A cluster with as few as 250 Tesla systems would have a peak theoretical performance of a petaflop. This system raises the bar for many-core computing in a heterogeneous environment that mixes multi-core CPUs and many-core GPUs for optimized performance. The combination of performance and energy efficiency enables scientists, engineers, and business users to tackle larger problems with the most advanced algorithms. Tesla S1070 delivers incredible performance per watt and can upgrade the performance of your data centers without requiring infrastructure changes for power or cooling and without massive increases in your energy bills.

Feeding The Relentless Demand For HPC Performance
With the world’s first teraflop many-core processor, the NVIDIA® Tesla™ S1070 computing system speeds the transition to energy-efficient parallel computing. With 960 processor cores delivering four teraflops of peak performance, 16 GB of ultra-fast memory for maximum performance with larger data sets, and a standard C compiler that simplifies application development, Tesla S1070 scales to solve the world’s most important computing challenges — more quickly and accurately.

Many-Core Architecture Delivers Optimum Scaling Across HPC Applications
Demand for computing performance in science and industry has far outpaced the ability of traditional CPU to keep up, even with the recent shift to multi-core CPUs. Many-core computing is the architectural answer to this problem, delivering hundreds of cores in a single processor compared to multi-core designs with only four, six, or eight. The availability of processors with hundreds of cores creates a discontinuity in computing because 1U systems with nearly one thousand cores are practical for the first time. This is practical now because the computing cores in a GPU were designed to be part of a massively-parallel system rather than being designed like a traditional CPU core. Dedicated ultra-fast memory for each Tesla processor also improves scalability as total memory bandwidth expands linearly when more GPUs are added to the solution.

High Efficiency Computing Platform for Energy-Conscious Organizations
Unlike any other solution available in the HPC space, the Tesla S1070 delivers four teraflops in a 1U chassis with a typical energy footprint of only 700 watts. This “high density computing” allows data center managers to deliver more performance for their users without new demands on the electrical and thermal capabilities of their existing data centers. Tesla S1070 creates the foundation for new green computing initiatives, while saving you money.

NVIDIA CUDA™ Architecture Unlocks the Power Of GPU Parallel Computing
The CUDA parallel computing architecture enhances performance by offloading computationally-intensive activities from the CPU to the GPU — unlocking the many-core processing power of NVIDIA GPUs to solve the most complex computation-intensive challenges such as protein docking, molecular dynamics, financial analysis, fluid dynamics, structural analysis and many others.

Accelerate your business and research outcomes with ready-to-go NVIDIA® Tesla™ S1070 computing system now.

Click here to configure the NVIDIA® Tesla™ S1070 1U Computing System

Features

System Architecture
The NVIDIA® Tesla™ S1070 Computing System is a 1U rack-mount system with four Tesla T10 computing processors. It can be connected to a single host system via two PCI Express connections to that host, or connected to two separate host systems via one PCI Express connection to each host.

Each NVIDIA switch and corresponding PCI Express cable connects to two of the four GPUs in the Tesla S1070. If only one PCI Express cable is connected to the Tesla S1070, only two of the GPUs will be used. To connect all four GPUs in a Tesla S1070 to a single host system, the host must have two available PCI Express slots and be configured with two cables as shown below.

The Tesla S1070 can also be used with hosts that have only one available PCI Express slot. However, two host systems are required and should be connected as shown below. Each host system will access two of the four GPUs.

Highlights

  • Four Tesla T10 graphics processing units (GPUs)
  • Available in 2 configurations based on GPU processor clock
    - 500 Configuration - 1.44 GHz peak clock
    - 400 Configuration - 1.296 GHz peak clock
  • 16GB of high speed memory, configured as 4GB for each GPU
  • 4x 512-bit GDDR3 memory interface (512-bit interface per GPU)
  • PCI Express x16 or PCI Express x8 low-profile host interface card
  • Remote management capabilities

Benefits

Four 1-Teraflop Processors in a High Density 1U System
Delivers up to 4 teraflops of performance in a 1U rack-mount system for unmatched performance in high density rack systems.

Massively-Parallel Many-Core Architecture
240 computing cores per GPU that can execute thousands of concurrent threads.

Scales to Multi-GPU Computing
Scale to thousands of processor cores to solve large-scale problems by splitting the problem across multiple GPUs.

Massively Parallel Many Core GPU Architecture
240 parallel processor cores per GPU that can execute thousands of concurrent threads.

Program in NVIDIA CUDA™: C for GPU
Programmable using CUDA, the world's leading application development platform for many core solutions.

IEEE 754 Floating Point
Ensures your results meet industry standard precision including optional features to ensure accuracy.

Double-Precision Floating Point Support
Meets the precision requirements of your most demanding applications with IEEE 64-bit precision.

Asynchronous Data Transfer
Turbocharges system performance because data transfers can be executed even while the computing cores are busy.

16 GB Ultra-fast Memory
Enables larger datasets to be stored locally with 4 GB dedicated for each processor to maximize performance and minimize data movement around the system.

4x 512-bit Memory Interface
Delivers 408 GB/s peak memory bandwidth for blistering data transfer as a 512-bit interface dedicated to each processor.

High-Speed, PCI-Express 2.0 Data Transfer
With low latency and high bandwidth, computing applications benefit from the highest data transfer rate possible through standard PCI-Express architecture.

Single-screw Rail Mounting
Single-screw rail design is quick to install like a tool-less design, but with the extra security and rigidity from a single screw to secure the rail to the rack.

System Monitoring Features
Easy management and monitoring post-installation helps your IT staff manage systems with minimal effort. Remote capabilities and status lights on the front and rear of the unit ensure your staff can see the status whether they are on the other side of the rack, or the other side of the world.

Dual PCI-Express 2.0 Cable Connections
Maximizes bandwidth between the host processor and the Tesla processors with up to 12.8 GB/s transfer rates (up to 6.4 GB/s per PCI Express connection).

Small-form-factor (SFF) Host Adapter Card
The low power host adapter card enables Tesla systems to work with virtually any PCIe compliant host system with an open PCI Express slot (x8 or x16).

Technical Specifications

Computing Processors • Total Tesla T10 graphics processing units (GPUs) - 4
• Total streaming processor cores - 960 (240/GPU)
• Frequency of processor cores - 1.296 GHz (400 configuration) to 1.44 GHz (500 configuration)
• Single precision floating point performance (peak) - 3.73 to 4.14 TFlops
• Double precision floating point performance (peak) - 311 to 345 GFlops
• Floating point precision - IEEE 754 single & double
Memory • Total dedicated memory - 16GB (4GB/GPU)
• Memory interface - 4x 512-bit GDDR3 (512-bit interface per GPU)
• Memory bandwidth - 408GB/sec total bandwidth (102GB/sec peak bandwidth per GPU)
Max. Power Consumption • 800W
System Interface • PCI Express x16 or x8
Programming Environment • CUDA
Mechanical Overview • Physical Dimensions
- 1.73” Height x 17.5” Width x 28.5” Depth
• Rack Compatibility
- Fits 4-post, 19” EIA compatible racks
- Rack depth between posts: 28.7 to 36.3 inches
• PCI Express Cable
- Standard: 0.5 meters in length
- Optional: 2.0 meters in length
• Host Interface Cards
- PCI Express low profile form factor
- Standard card requires a ×16 PCI Express slot
- An optional card is available for ×8 PCI Express slots
• External Connectors
- Two cable connectors for ×16 PCI Express
- C19 format female connector for power cord
Environmental Specifications and Conditions • Operating
- Input Power - 90 to 274 VAC, 50 to 60 Hz
- Temperature - 10 °C to 35 °C (50 °F to 95 °F ) at sea level with an altitude derating of 1.0 °C per every 1000 ft.
- Humidity - 10 % to 80 % RH, 28 °C (82.4 °F) maximum wet bulb temperature, non-condensing
- Altitude 0 to 5000 feet mean sea level (MSL)
- Shock - Half sine 40g, 2 ms duration
- Vibration - Sinusoidal 0.25g, 10 to 500 Hz, 3 axis. Random 1.0 Grms, 10 to 500 Hz
- Airflow 143 cfm maximum
• Non-Operating
- Temperature -40 °C to 60 °C (-40 °F to 140 °F)
- Humidity 10 % to 80 % RH, 38.7 °C (101.7°F) maximum wet bulb temperature, non-condensing
- Altitude 0 to 10,000 feet mean sea level (MSL) with maximum allowable rate of altitude change of 2000 ft/min.
- Shock - Half-sine: 80G, 2ms; Trapezoidal: 40G, 150 in/sec
- Vibration (random) 0.015-0.008G/Hz, 5-500 Hz, 10 minutes
Images

NVIDIA® Tesla™ S1070 1U Computing System

    

Resources

NVIDIA Tesla S1070 Datasheet
Click here to download the S1070 datasheet (258 KB PDF)

NVIDIA Tesla S1070 System Specification Document
Click here to download the S1070 system specification document (2.03 MB PDF)

NVIDIA CUDA™
NVIDIA CUDA™ technology is the world’s only C language environment that enables programmers and developers to write software to solve complex computational problems in a fraction of the time by tapping into the many-core parallel processing power of GPUs. With millions of CUDA-capable GPUs already deployed, thousands of software programmers are already using the free CUDA software tools to accelerate applications—from video and audio encoding to oil and gas exploration, product design, medical imaging, and scientific research.
Click here to learn more

NVIDIA CUDA Development Tools
NVIDIA’s CUDA development tools provide three key components to help you get started: the latest CUDA driver, a complete CUDA Toolkit, and CUDA SDK code samples.
Click here to download CUDA driver and toolkit