Small Language Models: When Smaller Is Better

Author:   Miguel Torres
Publisher:   Independently Published
ISBN:  

9798197568922


Pages:   212
Publication Date:   20 May 2026
Format:   Paperback
Availability:   Available To Order   Availability explained
We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Our Price $26.37 Quantity:  
Add to Cart

Share |

Small Language Models: When Smaller Is Better


Overview

Bigger is not always better. In production AI systems, bigger is often slower, more expensive, harder to deploy, harder to customize, and harder to control. Small Language Models: When Smaller Is Better is a practical guide to building useful AI systems when latency, cost, privacy, reliability, and deployment constraints matter as much as raw benchmark scores. Large language models are extraordinary generalists, but most products do not need the largest possible model for every request. They need the right model for the job. Sometimes that means a compact local model. Sometimes it means a fine-tuned specialist. Sometimes it means retrieval, routing, adapters, quantization, or a hybrid system where a small model handles the common path and a larger model becomes the fallback. This book treats small language models as engineering components, not as weaker clones of frontier models. You will learn how to reason about SLMs as classifiers, extractors, summarizers, local assistants, retrieval partners, tool callers, routing stages, draft generators, privacy-preserving workers, and cost-control mechanisms inside real systems. Inside, you will learn how to: Decide when a small language model is good enough, and when it is not Understand tokens, embeddings, attention, context windows, KV cache, logits, sampling, and instruction tuning Think clearly about scaling laws, data quality, synthetic data, distillation, and the lessons behind Phi-style training recipes Use compression techniques such as distillation, pruning, quantization, LoRA, QLoRA, and adapter-based fine-tuning Choose an SLM by task fit, license, hardware target, latency budget, context window, evaluation results, and operational risk Run models locally with tools and formats such as llama.cpp, GGUF, Ollama, ONNX Runtime GenAI, MLX, vLLM, and related inference stacks Design retrieval-augmented generation systems that help smaller models answer with better context Build evaluations that measure task quality, hallucination risk, latency, regressions, and cost-per-success Use routing, cascades, speculative decoding, tool calling, structured outputs, caching, and AI gateways Handle safety, privacy, governance, model observability, rollout strategy, and production operation The book is written for backend engineers, platform engineers, machine learning engineers, product engineers, architects, tech leads, and developers who want to build AI systems that survive real constraints. You do not need to be a research scientist. You need enough technical grounding to ask better questions before sending every request to the biggest model available. If you are building AI features for mobile, desktop, edge devices, private environments, customer VPCs, low-latency workflows, high-volume products, or specialized domain tasks, this book gives you the mental models and system-design vocabulary to make better trade-offs. By the end, you will have a practical decision framework for answering the central question: when is a smaller model not just cheaper, but architecturally better?

Full Product Details

Author:   Miguel Torres
Publisher:   Independently Published
Imprint:   Independently Published
Dimensions:   Width: 15.20cm , Height: 1.10cm , Length: 22.90cm
Weight:   0.290kg
ISBN:  

9798197568922


Pages:   212
Publication Date:   20 May 2026
Audience:   General/trade ,  General
Format:   Paperback
Publisher's Status:   Active
Availability:   Available To Order   Availability explained
We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Table of Contents

Reviews

Author Information

Tab Content 6

Author Website:  

Countries Available

All regions
Latest Reading Guide

RGJ26

 

Shopping Cart
Your cart is empty
Shopping cart
Mailing List