Liknande böcker
Intel® Xeon Phi™ Coprocessor Architecture and Tools : The Guide for Application Developers
Bok av Rezaur Rahman
Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers provides developers a comprehensive introduction and in-depth look at the Intel Xeon Phi coprocessor architecture and the corresponding parallel data structure tools and algorithms used in the various technical computing applications for which it is suitable. It also examines the source code-level optimizations that can be performed to exploit the powerful features of the processor.
Xeon Phi is at the heart of worlds fastest commercial supercomputer, which thanks to the massively parallel computing capabilities of Intel Xeon Phi processors coupled with Xeon Phi coprocessors attained 33.86 teraflops of benchmark performance in 2013. Extracting such stellar performance in real-world applications requires a sophisticated understanding of the complex interaction among hardware components, Xeon Phi cores, and the applications running on them.
In this book, Rezaur Rahman, an Intel leader in the development of the Xeon Phi coprocessor and the optimization of its applications, presents and details all the features of Xeon Phi core design that are relevant to the practice of application developers, such as its vector units, hardware multithreading, cache hierarchy, and host-to-coprocessor communication channels. Building on this foundation, he shows developers how to solve real-world technical computing problems by selecting, deploying, and optimizing the available algorithms and data structure alternatives matching Xeon Phis hardware characteristics. From Rahmans practical descriptions and extensive code examples, the reader will gain a working knowledge of the Xeon Phi vector instruction set and the Xeon Phi microarchitecture whereby cores execute 512-bit instruction streams in parallel.
What youll learn
How to calculate theoretical Gigaflops and bandwidth numbers on the hardware and measure them through code segment How to estimate latencies in fetching data from different cache hierarchies, including memory subsystems How to measure PCIe bus bandwidth between the host and coprocessor How to exploit power management and reliability features built into the hardware