Course description
In this course you’ll learn about how accelerator hardware is designed and integrated into the system. With that foundation, we can start talking about what you can expect from the system when you use various C++AMP features. Specifically, we will talk about data transfers to and from the accelerator, memory layout and memory accesses from the accelerator, and thread execution and control flow on the accelerator. Then we’ll cover what support Microsoft’s Visual Studio 2012 has for C++ AMP.
Prerequisites
This course assumes that you have a good understanding of core C++ concepts, included classes, objects, containers, and iterators. You should also be familiar with Visual Studio 2012 for Visual C++ development, including compilation, testing, and debugging. Although not required or expected, you may get more out of some parts of the course if you are familiar with multithreaded programming, Visual Studio 2012’s debugging capabilities for multiple threads, and basic computer architecture concepts.
Meet the expert
John Stratton, Ph.D., is a senior architect at Multicoreware Inc. and a visiting lecturer at the University of Illinois at Urbana-Champaign. John has been at the forefront of research and education in heterogeneous computing, reaching hundreds of students through the Virtual School of Computational Science and Engineering’s courses on heterogeneous computing and optimization for scientific applications. John writes papers and articles for leading academic conferences and journals as well as broad-reaching publications such as IEEE Computer. He is also an active participant and presenter at several industry and technology groups and events across the country.
Course outline
Memory Layout
Memory Layout Overview (25:47)
- Introduction (00:48)
- GPU Architecture Overview (08:45)
- Minimum Scale of Parallelism (07:43)
- Demo: Scale and Preformance (03:20)
- Demo: Benchmark Results (04:34)
- Summary (00:35)
Memory Layout and SIMD (31:04)
- Introduction (01:03)
- Memory Layout and Accesses (06:37)
- Good Access Patterns (00:51)
- Demo: Transpose Operation (05:16)
- Implicit SIMD Execution (03:57)
- Divergent Penalties (03:07)
- Demo: Divergence (04:32)
- Demo: Divergence Problems (04:42)
- Summary (00:55)
Data Transfers (17:42)
- Introduction (00:46)
- Host-Accelerator Data Transfers (03:30)
- When Data Transfers Happen (03:02)
- Demo: Data-Transfers (05:48)
- Demo: Array View (04:01)
- Summary (00:33)
Support for C++ AMP
Windows Support (14:44)
- Introduction (00:30)
- C++AMP uses Direct Compute (03:31)
- Demo: AMP Implementations (04:01)
- Demo: Multiple Accelerators (05:16)
- Summary (01:23)
Debugging (20:56)
- Introduction (00:43)
- C++AMP Debugging (02:56)
- Demo: Debugging C++Amp (05:24)
- Demo: Debugging Tools (07:17)
- Demo: Freezing Threads (02:00)
- Debugging Parallel Kernal Code (01:52)
- Summary (00:41)
Tiling (32:32)
- Introduction (00:52)
- Tiled Extents and Indexes (01:21)
- Tiled Accelerator Execution (04:58)
- Demo: Tiled Extents (05:44)
- Tiled Accelerator Execution (2) (06:58)
- Demo: Tile Size (05:50)
- Demo: Tile Variables (05:40)
- Summary (01:05)