|
|
|||
|
||||
OverviewShip fast, private, and efficient AI in the browser with ONNX Runtime Web, WebGPU, and a proven pipeline from training to production. Developers face real constraints in the browser, from cross origin isolation and CSP to GPU limits, storage quotas, and variable device performance. This book gives you a practical end to end system that handles those constraints while delivering low latency features users can trust. You will learn when client side inference is the right call, how to export and optimize models, and how to run them reliably with WebGPU and WASM using clean fallbacks. The result is a codebase that is faster to maintain, easier to ship, and ready for production. Decide where inference should run using clear cost, privacy, and latency trade offs Export PyTorch models to ONNX with external data to handle 2 GiB limits Convert and optimize graphs into ORT format and apply mixed precision fp16 safely Use ONNX Runtime Web SessionOptions, IO binding, and device tensors to keep data on GPU Apply graph capture on WebGPU for static shapes and plan around binding size limits Reach stable CPU performance with WASM threads and SIMD through cross origin isolation Probe WebGPU features including shader f16, subgroups, and timestamp queries Select providers with a backend matrix that adapts across WebGPU, WebNN, and WASM Build tokenizer workflows and pipelines with Transformers.js using device webgpu Implement preprocessing and postprocessing for images and audio with codecs and batching Cache large weights in IndexedDB and OPFS with quota checks and eviction handling Version and validate assets using manifests, ETags, and integrity checks Stream and lazy load sharded weights with HTTP Range for faster first use Handle large models with KV cache tiling and binding size aware layouts Partition graphs between WebGPU and WASM for selective fallbacks on weaker devices Build real projects end to end, including Stable Diffusion Turbo with fp16 weights Wire Whisper Tiny for streaming capture with VAD for robust speech input Ship real time background removal with camera compositing for the web Deliver CLIP image search with local embeddings and an IndexedDB vector index Set headers and CSP correctly, avoid COEP credentialless pitfalls, and keep isolation Profile performance with ORT logs and the WebGPU inspector to remove bottlenecks Migrate cleanly from ONNX.js and TensorFlow.js to ORT Web without breaking flows Stand up production telemetry, error boundaries, and alerting that respect privacy Plan costs for CDN egress, caching, and storage, with practical distribution strategies Future proof with feature detection for WebNN and NPUs and a maintenance roadmap This is a code heavy guide, with working examples that show IO binding, fp16 kernels, manifests, service workers, feature probes, and complete project wiring so you can ship real products. Get the guide that turns browser AI from a demo into a dependable product, grab your copy today. Full Product DetailsAuthor: Aura FenwickPublisher: Independently Published Imprint: Independently Published Dimensions: Width: 17.80cm , Height: 1.60cm , Length: 25.40cm Weight: 0.531kg ISBN: 9798273164123Pages: 304 Publication Date: 05 November 2025 Audience: General/trade , General Format: Paperback Publisher's Status: Active Availability: Available To Order We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately. Table of ContentsReviewsAuthor InformationTab Content 6Author Website:Countries AvailableAll regions |
||||