NEC SX Architecture
            
                Vector processor + x86/Linux architecture - The new SX architecture
                contains the Vector Engine (VE) and Vector Host (VH). The VE executes complete applications
                while the VH mainly provides OS functions for connected VEs. The VE consists of
                one vector processor with eight vector cores, using "high bandwidth memory" modules
                (HBM2) for utmost memory bandwidth. The world's first implementation of one CPU
                LSI with six HBM2 memory modules using a "chip-on-wafer-on-substrate" technology
                (CoWoS) leads to the world-record memory bandwidth of 1.2 TB/s. It is connected
                to the VH, a standard x86/Linux node, through PCIe. This new SX architecture, which
                executes an entire application on the VE and the OS on the VH, combines highest
                sustained performance, for which vector processors are famous, in a well-known x86/Linux
                environment.
            
            
            
                 
             
            
                Extremely high capability core and processor with extremely high memory bandwidth
                - The vector core on the VE processor is the most powerful single core in HPC as
                of today, thus keeping the design philosophy from the previous SX series. With eight
                cores the vector processor will execute applications with extremely high sustained
                performance. It features 2.45 TF peak performance and 1.2 TB/s memory bandwidth
                per processor. Different from standard processors, a vector architecture is known
                to achieve a significant fraction of the peak performance on real applications.
            
            
            
                NEC SX-Aurora TSUBASA Memory
            
                The NEC Vector Engine Processor has a newly developed shared "Last-Level-Cache"
                (LLC), the first shared vector cache ever. This shared LLC serves all cores simultaneously,
                and has a "write-back" policy, which means coherency between different cores, LLC
                and memory is always easily ensured. At the same time this kind of architecture
                lends itself easily to a shared memory parallelization, by autoparallelization or
                OpenMP, while MPI would be used to parallelize over different Engines. The last
                level cache has a line-size of 128 bytes, and some additional features are implemented
                to increase the efficency for strided stores or scatter operations.
            
            
                NEC utilizes the second generation of the "High Bandwidth Memory" standard (HBM2).
                A HBM2 memory block is realized by stacking four or eight dies together, and it
                achieves up to 200 GB/s bandwidth while providing either 4 or 8 GB of capacity.
                Six of these memory blocks and the Vector Engine Processor are connected by means
                of a "silicon interposer", a special die to mount on and that connects
                memory and processor. They provide a total of 24 GB to 48 GB per Vector Engine and
                industry-leading 1.2 TB/s bandwidth.
            
            
            
            
                NEC SX-Aurora TSUBASA Interconnect
            
               The NEC Vector Engine can communicate with other Vector Engines or x86 CPUs over shared memory, PCI Express or a high speed network.
            
            
            
            
                GPGPU and VE
            
         
        
        
        
        
        
        
            * Other names and brands may be claimed as the property of others.