Building Large Language Models with Python: A Developer's Guide to Sparse MoE, 1-Bit Quantization, Reasoning Systems and Multimodal AI

Author:   Samuel Reynolds
Publisher:   Independently Published
ISBN:  

9798259295339


Pages:   142
Publication Date:   28 April 2026
Format:   Paperback
Availability:   Available To Order   Availability explained
We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Our Price $68.61 Quantity:  
Add to Cart

Share |

Building Large Language Models with Python: A Developer's Guide to Sparse MoE, 1-Bit Quantization, Reasoning Systems and Multimodal AI


Overview

Most LLM books teach you how to call an API. This one teaches you how to build what's behind it. As frontier AI shifts toward efficiency, sparsity, and on-device deployment, the engineers who understand the architecture not just the interface are the ones defining what comes next. Building Large Language Models with Python gives you that understanding, from the mathematics of attention to the deployment of a quantized, reasoning-capable model on local hardware. Written from hard-won production experience, each chapter pairs rigorous theory with complete Python implementations not toy examples, but the kind of code that holds up under the demands of real training runs and live inference pipelines. What you'll build: - A Grouped-Query Attention module with KV cache support - A Top-K sparse MoE layer with load-balancing auxiliary loss - A BitLinear layer implementing ternary {-1, 0, 1} weights from scratch - A Vision Transformer encoder with a multimodal projection layer - A Process Reward Model for step-level reasoning verification - A full DPO and GRPO training loop for alignment - A local-first MCP server for agentic tool use - A speculative decoding pipeline using a draft model Topics covered include: Rotary Positional Embeddings (RoPE) - FlashAttention-3 concepts - Quantization-Aware Training vs. Post-Training Quantization - Expert parallelism and All-to-All communication - FSDP vs. DDP distributed training - PagedAttention and KV cache optimization - On-device LoRA fine-tuning - Chain-of-thought reasoning architecture This book is for you if: You're a software engineer or ML practitioner comfortable with Python and PyTorch You understand how a basic transformer works and want to go significantly deeper You want to move beyond using models to building and owning them You're building for edge deployment, private AI, or resource-constrained environments The field is moving fast. This book is written for engineers who intend to move faster.

Full Product Details

Author:   Samuel Reynolds
Publisher:   Independently Published
Imprint:   Independently Published
Dimensions:   Width: 17.80cm , Height: 0.80cm , Length: 25.40cm
Weight:   0.259kg
ISBN:  

9798259295339


Pages:   142
Publication Date:   28 April 2026
Audience:   General/trade ,  General
Format:   Paperback
Publisher's Status:   Active
Availability:   Available To Order   Availability explained
We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Table of Contents

Reviews

Author Information

Tab Content 6

Author Website:  

Countries Available

All regions
Latest Reading Guide

MRGC26

 

Shopping Cart
Your cart is empty
Shopping cart
Mailing List