|
|
|||
|
||||
OverviewA practical guide to high-performance CUDA development for engineers, researchers, and developers who need more than introductory examples. This book focuses on the full workflow of GPU computing, from understanding how streaming multiprocessors execute warps to building maintainable, testable, and scalable applications for real scientific workloads. The chapters move from core architecture and programming fundamentals into profiling, memory tuning, numerical accuracy, and multi-GPU scaling. You will see how to turn a correct kernel into an efficient one, how to measure bottlenecks with Nsight tools, and how to make informed tradeoffs between occupancy, bandwidth, latency, and precision. What this book covers GPU architecture and execution behavior, including warps, scheduling, memory hierarchy, and data movement costs. CUDA kernel design, with launch configuration, indexing, synchronization, debugging, and reusable interfaces. Performance engineering, using profiling metrics and iterative optimization based on measured results. Memory optimization, including coalescing, shared memory tiling, register pressure, cache behavior, and data layout. Common scientific patterns, such as stencils, reductions, scans, sparse formats, and batched linear algebra. Numerical correctness, with floating point behavior, stable summation, boundary handling, and CPU validation. Advanced coordination techniques, such as warp and block level operations, streams, events, and asynchronous overlap. Host and multi-GPU engineering, covering pinned memory, unified memory, partitioning strategies, NCCL, halo exchange, and scaling studies. Why it stands out Engineering-first approach, centered on real optimization decisions rather than isolated syntax. Workflow oriented, with profiling, testing, benchmarking, and regression tracking built into the discussion. Useful for scientific computing, especially stencil solvers, sparse methods, reductions, and iterative pipelines. Built for maintainability, with guidance on project structure, code reuse, and repeatable validation. Ideal for anyone who wants to write CUDA code that is not only correct, but also fast, traceable, and ready for production-scale workloads. Full Product DetailsAuthor: Eamon VirekPublisher: Independently Published Imprint: Independently Published Dimensions: Width: 21.60cm , Height: 1.30cm , Length: 27.90cm Weight: 0.590kg ISBN: 9798196510748Pages: 252 Publication Date: 11 May 2026 Audience: General/trade , General Format: Paperback Publisher's Status: Active Availability: Available To Order We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately. Table of ContentsReviewsAuthor InformationTab Content 6Author Website:Countries AvailableAll regions |
||||