Build a GPT-Style AI: From Scratch with PyTorch, CUDA AMP, LoRA, QLoRA, Evaluation and Local Serving

Author:   Riccardo Rizzo
Publisher:   Independently Published
ISBN:  

9798195787608


Pages:   446
Publication Date:   15 May 2026
Format:   Paperback
Availability:   Available To Order   Availability explained
We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Our Price $91.87 Quantity:  
Add to Cart

Share |

Build a GPT-Style AI: From Scratch with PyTorch, CUDA AMP, LoRA, QLoRA, Evaluation and Local Serving


Overview

Most LLM books show either high-level concepts or finished tools. This book teaches the full engineering path. Build a GPT-Style AI starts from an empty folder and builds a language model project step by step. You will create the environment, tokeniser, dataset pipeline, decoder-only transformer, training loop, checkpoint system, validation reports, mixed precision path, generation module, LoRA workflow, QLoRA workflow, release checklist, and local API. The emphasis is practical understanding. You will see why tokenisation comes before modelling, how next-token training works, how checkpoints protect long runs, why validation loss is not enough, how CUDA AMP uses torch.autocast and GradScaler, why BF16 usually does not need gradient scaling in the same way as FP16, and how CPU and MPS fallbacks should be handled cleanly. The book is honest about scale. A 0.5B model is large enough to teach real engineering pressure. A 7B model changes the hardware conversation. Frontier commercial systems such as GPT-5.5 and Claude require industrial infrastructure, proprietary data pipelines, post-training, safety systems, evaluation programmes, and serving platforms. This book does not pretend otherwise. Instead, it gives you the machinery and judgement to understand how LLM systems are built. You will learn how to: Build a GPT-style transformer directly in PyTorch. Train and inspect a tokeniser. Prepare datasets for next-token prediction. Implement training, validation, checkpointing, and resume. Use CUDA AMP with torch.autocast and GradScaler. Understand BF16, FP16, CPU fallback, and Apple Silicon MPS behaviour. Design a 0.5B model configuration before wasting compute. Reason about scaling from 0.5B to 3B and 7B. Fine tune existing models with LoRA and QLoRA. Evaluate, release, quantise, and serve a model locally. The companion code is available at https: //github.com/riccione83/tiny-llm. This is a practical technical book for software engineers, machine learning practitioners, students, founders, and serious builders who want to understand LLMs by constructing the system beneath them.

Full Product Details

Author:   Riccardo Rizzo
Publisher:   Independently Published
Imprint:   Independently Published
Dimensions:   Width: 15.20cm , Height: 2.30cm , Length: 22.90cm
Weight:   0.594kg
ISBN:  

9798195787608


Pages:   446
Publication Date:   15 May 2026
Audience:   General/trade ,  General
Format:   Paperback
Publisher's Status:   Active
Availability:   Available To Order   Availability explained
We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Table of Contents

Reviews

Author Information

Tab Content 6

Author Website:  

Countries Available

All regions
Latest Reading Guide

MRGC26

 

Shopping Cart
Your cart is empty
Shopping cart
Mailing List