Publication Date: May 2015 | ISBN 978-0-9885234-2-5 (electronic), ISBN 978-0-9885234-0-1
(print) | Edition: 2
This book will guide you to the mastery of parallel programming with Intel® Xeon®
family products: Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. It includes
a detailed presentation of the programming paradigm for Intel® Xeon® product family,
optimization guidelines, and hands-on exercises on systems equipped with the Intel®
Xeon Phi™ coprocessors, as well as instructions on using Intel software development
tools and libraries included in Intel Parallel Studio XE.
This book is targeted toward developers familiar with C/C++ programming in Linux.
Developers with little parallel programming experience will be able to grasp the
core concepts of these subjects from the detailed commentary in Chapter 3. For advanced
developers familiar with multi-core and/or GPU programming, the ebook offers materials
specific to Intel compilers and Intel® Xeon® family products, as well as optimization
advice pertinent to Many Integrated Core (MIC) architecture.
We have written these materials relying on key elements for efficient learning:
practice and repetition. As a consequence, the reader will find a great number of
code listings in the main section of these materials. In the extended appendix,
we provided numerous hands-on exercises that one can complete either under an instructors
supervision, or autonomously in a self-paced training environment.
This document is different from a typical book on computer science, because we intended
it to be used as a lecture plan in an intensive learning course. Speaking in programming
terms, a typical book traverses material with a depth-first algorithm, describing
every detail of each method or concept before moving on to the next method. In contrast,
this document traverses the scope of materials with a breadth-first algorithm.
First, we give an overview of multiple methods to address a certain issue. In the
subsequent chapter, we re-visit these methods, this time in greater detail. We may
go into even more depth down the line. In this way, we expect that developers will
have enough time to absorb and comprehend the variety of programming and optimization
methods presented here.
What's New in Second Edition?
Second edition is a major revision of the book. New features include:
- Revised practical exercises tuned for the behavior of the latest software tools
in Intel Parallel Studio XE 2015
- Obsoleted information on older versions of MPSS 2.x replaced with current information
for MPSS 3.x
- 40% of the exercises are new with the updates targeted to efficient learning
- All exercises are revised for improved workflow (instructions located next to source
code) and user experience (standardized performance reporting)
- New topics discussed in the text: networking in clusters with coprocessors, upcoming
second generation of coprocessors, additional optimization topics
- Improved layout: large fonts optimized for reading the PDF file on a computer screen
- Significant changes in the text based on reader feedback improve clarity and flow
Overall, if you own the 1st edition of the book, it remains a valid introduction
into parallel programming with Intel Xeon Phi coprocessors. However, if you need
up-to-date practical programming and optimization recipes or classroom material,
it is worth upgrading to the 2nd edition.
a print book about any parallel computing topic is daunting for several reasons.
As the author of 23 technical books, I can attest to the fact that technology can
become out of date before the book is published. This is the most difficult task
in writing a print book—keeping things up to date for publication. The second difficulty
is covering the enormous amount of information in such a way that it fits into a
print book, yet has enough depth to provide usable information. The third difficulty
is addressing a wide audience so that everyone gets the value of the information,
whether parallel programming newcomer or veteran. In my opinion, the authors of
“Parallel Programming and Optimization With Intel Xeon Phi Coprocessors, 2nd Edition”
have done a phenomenal job on all three counts. The book is current, provides information
that is directly applicable, and can be effectively read by a wide range of programmers.
The authors, Andrey Vladimirov, Ryo Asai and Vadim Karpusenko, start off talking
about the Xeon and Xeon Phi technology, and how it differs from previous multicore
processors. They also make a bold claim that the book takes a platform agnostic
approach, and presents concepts in a portable manner. I did find this to be true
throughout the book, so regardless of the operating system, the concepts presented
apply. Chapter 3 is processor agnostic, so applicable even for pre-Xeon processors.
Even though much of the information is applicable across the board, they do point
out ways to optimize for the Xeon Phi if that is your target.
Native and Offload Models
The second chapter is gold. It provides a clear explanation of the two prevalent
programming models for the Xeon Phi. They are the offload and native models. Native
programming allows an executable program to be transferred to a coprocessor. Once
transferred, the executable can run without the involvement of the host. This is
made possible in large part because the Xeon Phi runs a Linux operating system along
with a virtual file system and multi-user environment. Fortunately, Intel has taken
care of all of the details by adding options to their compilers that facilitate
building as a native executable. In contrast to what is normally called CPU parallelization,
the entire native executable is run on the Xeon Phi rather than portions being doled
out to processor cores.
The offload model resembles the native model, but has differences. When using the
offload model, the executable begins execution on the host. At any point, though,
some sections of code and data can be offloaded to the coprocessor, and executed
there. As with the native model, Intel has made the offload model easy and straightforward.
A set of pragma statements can be used to offload code portions. For instance, the
following example offloads a small amount of code from a program that is otherwise
running on the host.
Parallel Programming Paradigms
While Chapter 3 does not present anything that is new, it is an important chapter
for the book since it provides a foundation of parallelization using programming
language extensions. This includes vectorization, OpenMP, Cilk, and MPI. All four
of these are cornerstones of modern parallelization, and an essential element of
this book. As I pointed out earlier, this book makes an effort to address a wide
audience. Inclusion of this material means that newcomers to parallelization will
not have to seek other references in order to learn the basics of, say, a parallel
The material is not only important, but it is well-written and fairly complete.
This concisely written chapter will be what I reach for first when I need to double
check OpenMP syntax. What I especially appreciate is that the examples are all simple
and to the point. Many times authors throw in several ideas, and it can be hard
to separate them. Here, each idea is demonstrated on its own and crystal clear.
Optimizing Parallel Applications
Chapter 4 is so full of meat that it will take me months to digest. It tackles optimization
of parallel applications, especially with regards to Xeon and Xeon Phi processors.
I really appreciated the optimization checklist. Developers often think they have
exhaustively optimized code, only to discover later on that they missed an opportunity.
With this checklist, you can make sure that you have considered everything.
There is a lot in this chapter, but the most useful material for me for the type
of development I do was the section of optimization of transcendental functions.
I often need to crunch large sets of data by performing math on each element. For
instance, to calculate the standard deviation of a data set requires squaring numbers
and also finding square roots. Taking advantage of the Xeon Phi technology can provide
a 2x or even 3x advantage over using the normal math libraries. For any application
that crunches numbers, this can make the difference if you heed the advice of the
Software Development Tools
When the rubber hits the road, you need to find the tools that will deliver on the
Xeon Phi promises. Chapter 5 is a roadmap to the tools. As I read the chapter, I
realized how much time it was saving me. Instead of sifting through dozens of online
articles to find the answers, it is summarized in this chapter. Most of this chapter
revolves around Intel’s Parallel Studio. The authors clarify how to get the most
of this product.
If you have any aspirations of taking advantage of Xeon and Xeon Phi processors,
this book is a must-have. If you just want a concise overview of parallelization,
this book is also a must have. You won’t read and master the material in a week.
But I plan to work through the entire book, using it to hone my skills before most
other developers do, which will give me a distinct advantage."
- Rick Leinecker, Contributing Editor, Slashdot Media