Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Author: Peter E Poisson
Publisher: Independently Published
ISBN:

9798294338459

Pages: 164
Publication Date: 26 July 2025
Format: Paperback
Availability: Available To Order

We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Our Price $47.49 Quantity:

Share |

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Overview

Are you struggling to scale your large language models (LLMs) without breaking the bank or sacrificing latency? This book offers a clear roadmap to optimize inference, reduce costs, and scale seamlessly across platforms like PyTorch, ONNX, vLLM, and more. Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you're building chatbots, document summarizers, or enterprise AI tools, this book teaches proven methods to accelerate inference while maintaining accuracy. It dives deep into hardware-aware optimizations, quantization, model pruning, compiler acceleration, and memory-efficient runtime strategies without locking you into any single framework. Written with clarity and real-world use in mind, the book features practical case studies, side-by-side performance comparisons, and up-to-date techniques from the cutting edge of AI deployment. If you're building, serving, or scaling LLMs in 2025, this is the performance engineering guide you've been waiting for. Key Features: - Framework-agnostic optimization techniques using PyTorch, ONNX Runtime, vLLM, llama.cpp, and more - Deep dive into quantization (INT8/4-bit), distillation, pruning, and KV caching - Hands-on examples with FastAPI, Hugging Face Transformers, and serverless deployment - Covers performance profiling, streaming, batching, and cost-efficient scaling - Future-proof insights on compiler-aware models, LoRA 2.0, and edge inference Ready to build LLM systems that are faster, cheaper, and more scalable? Grab your copy of Optimizing LLM Performance today and deploy smarter.

Full Product Details

Author: Peter E Poisson
Publisher: Independently Published
Imprint: Independently Published
Dimensions: Width: 17.80cm , Height: 0.90cm , Length: 25.40cm
Weight: 0.295kg
ISBN:

9798294338459

Pages: 164
Publication Date: 26 July 2025
Audience: General/trade , General
Format: Paperback
Publisher's Status: Active
Availability: Available To Order

We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Reviews

Author Information

Tab Content 6

Author Website:

Countries Available

All regions

Latest Reading Guide

Shopping Cart

Your cart is empty

Mailing List

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

9798294338459

Availability Information

Overview

Full Product Details

9798294338459

Table of Contents

Reviews

Author Information

Tab Content 6

Countries Available

Sign up now