NEC SX Architecture
Vector processor + x86/Linux architecture - The new SX architecture
contains the Vector Engine (VE) and Vector Host (VH). The VE executes complete applications
while the VH mainly provides OS functions for connected VEs. The VE consists of
one vector processor with eight vector cores, using "high bandwidth memory" modules
(HBM2) for utmost memory bandwidth. The world's first implementation of one CPU
LSI with six HBM2 memory modules using a "chip-on-wafer-on-substrate" technology
(CoWoS) leads to the world-record memory bandwidth of 1.2 TB/s. It is connected
to the VH, a standard x86/Linux node, through PCIe. This new SX architecture, which
executes an entire application on the VE and the OS on the VH, combines highest
sustained performance, for which vector processors are famous, in a well-known x86/Linux
Extremely high capability core and processor with extremely high memory bandwidth
- The vector core on the VE processor is the most powerful single core in HPC as
of today, thus keeping the design philosophy from the previous SX series. With eight
cores the vector processor will execute applications with extremely high sustained
performance. It features 2.45 TF peak performance and 1.2 TB/s memory bandwidth
per processor. Different from standard processors, a vector architecture is known
to achieve a significant fraction of the peak performance on real applications.
NEC SX-Aurora TSUBASA Memory
The NEC Vector Engine Processor has a newly developed shared "Last-Level-Cache"
(LLC), the first shared vector cache ever. This shared LLC serves all cores simultaneously,
and has a "write-back" policy, which means coherency between different cores, LLC
and memory is always easily ensured. At the same time this kind of architecture
lends itself easily to a shared memory parallelization, by autoparallelization or
OpenMP, while MPI would be used to parallelize over different Engines. The last
level cache has a line-size of 128 bytes, and some additional features are implemented
to increase the efficency for strided stores or scatter operations.
NEC utilizes the second generation of the "High Bandwidth Memory" standard (HBM2).
A HBM2 memory block is realized by stacking four or eight dies together, and it
achieves up to 200 GB/s bandwidth while providing either 4 or 8 GB of capacity.
Six of these memory blocks and the Vector Engine Processor are connected by means
of a "silicon interposer", a special die to mount on and that connects
memory and processor. They provide a total of 24 GB to 48 GB per Vector Engine and
industry-leading 1.2 TB/s bandwidth.
NEC SX-Aurora TSUBASA Interconnect
The NEC Vector Engine can communicate with other Vector Engines or x86 CPUs over shared memory, PCI Express or a high speed network.
GPGPU and VE
* Other names and brands may be claimed as the property of others.