|
|
|||
|
||||
OverviewAI GPU Workloads for Beginners is your practical, hands-on gateway into the world of GPU-accelerated artificial intelligence. Written for newcomers who want to understand, train, fine-tune, deploy, and optimize AI models using modern GPU hardware and today's cutting-edge frameworks, this book provides clear guidance, real projects, and step-by-step labs you can follow on any GPU-local or cloud. This is not a theory book. Every chapter is built around practical execution: inspecting your GPU, training deep learning models with PyTorch, fine-tuning LLMs with modern techniques (LoRA, QLoRA, 4-bit quantization), optimizing inference with TensorRT and vLLM, and deploying real services using Docker, Kubernetes, Triton, and the NVIDIA GPU Operator. You will learn the exact workflows used by AI engineers, MLOps teams, and GPU cluster operators in real production environments. Whether you're running a single GPU workstation, a cloud GPU instance, or a small multi-GPU cluster, this book shows you how to extract maximum performance from your hardware-covering VRAM management, mixed precision, KV cache optimization, batching strategies, and GPU memory tuning. You'll also integrate observability using Prometheus, Grafana, and DCGM to identify bottlenecks and improve throughput, latency, and reliability. Key Topics Include: - GPU fundamentals: CUDA, tensor cores, parallelism, HBM, throughput, and memory architecture - Training and fine-tuning: PyTorch, AMP, CNNs, Transformers, FSDP, DeepSpeed, LoRA/QLoRA, bitsandbytes - Inference optimization: vLLM, TensorRT-LLM, Text Generation Inference, ONNX Runtime - Deployment workflows: Docker GPU containers, Kubernetes GPU Operator, Triton Inference Server - Performance tuning: OOM mitigation, VRAM optimization, data pipeline tuning, batching, quantization - Full-stack GPU project: fine-tune a model, build an inference service, add monitoring, load testing, and deploy end-to-end What You Will Build: - A GPU-optimized training pipeline - A fine-tuned 7B LLM using QLoRA - A production-ready inference server using vLLM or Triton - A live monitoring stack (Prometheus + Grafana) - A full GPU workload deployment using Docker or Kubernetes - A complete performance optimization loop for real-world AI systems Who This Book Is For: Beginners, developers, data scientists, AI enthusiasts, and homelab builders who want to understand and operate GPU-accelerated AI systems without needing prior deep-learning expertise. Designed with clarity, practical structure, and real GPU workflows, AI GPU Workloads for Beginners gives you the confidence to build, deploy, and optimize modern AI workloads-exactly the way professionals do it today. Full Product DetailsAuthor: Hollis DenningPublisher: Independently Published Imprint: Independently Published Dimensions: Width: 21.60cm , Height: 1.60cm , Length: 27.90cm Weight: 0.710kg ISBN: 9798278645320Pages: 304 Publication Date: 13 December 2025 Audience: General/trade , General Format: Paperback Publisher's Status: Active Availability: In Print This item will be ordered in for you from one of our suppliers. Upon receipt, we will promptly dispatch it out to you. For in store availability, please contact us. Table of ContentsReviewsAuthor InformationTab Content 6Author Website:Countries AvailableAll regions |
||||