|
|
|||
|
||||
OverviewBuild enterprise RAG that survives production with NVIDIA NeMo Many teams can demo retrieval augmented generation, far fewer can ship a stable stack that parses messy documents, enforces permissions, meets latency targets, and produces grounded answers users can trust. This book gives you a complete, practical path from ingestion to day two operations using NeMo Retriever and NIM services. You will map real business questions to measurable objectives, wire a pipeline that holds up under load, and deploy it on Kubernetes with clear guardrails, observability, and runbooks. Every concept is backed by working requests, scripts, and configs so you can adopt what you need without guesswork. Plan a production RAG blueprint, connect business objectives to retrieval quality, latency budgets, availability, and cost Parse complex PDFs with tables, figures, and OCR, choose between nemoretriever parse and pdfium, and standardize output schemas Build ingestion jobs with run ids, tracing, and audits, deduplicate at scale with simhash and minhash before embedding Remove or mask PII and enrich metadata with NeMo Curator for provenance, permissions, recency, and tags Select embedding models and dimensions, apply normalization and pooling, and balance multilingual coverage with storage trade offs Index at scale in Milvus, Qdrant, or pgvector, use GPU options and cuVS with CAGRA or IVF PQ, and choose parameters with evidence Retrieve with dense vectors and filters, add sparse BM25 when it matters, and fuse with RRF including score scaling and filtering Rerank with NeMo Text Reranking, control truncation and token budgets, and assemble context windows that keep citations intact Rewrite queries and add multi hop retrieval when needed, introduce lightweight agents only where they reduce failure cases Add NeMo Guardrails with Colang, enforce safety, privacy, grounding checks, and clear refusal policies Measure with NeMo Evaluator, define datasets and metrics, use LLM as judge carefully, and wire CI with golden sets and regression tests Observe Triton and NIM with Prometheus, Grafana, and Datadog, track p95 and p99, error codes, and queue time Deploy with the NIM Operator and Helm, set GPU Operator and MIG profiles, scale with autoscaling and in flight batching Support air gapped installs and private registries, pin NIM images by digest, canary safely, and roll back fast Integrate frameworks, call NIM endpoints from LangChain, LlamaIndex, and Haystack, and connect to Milvus, Qdrant, Elastic, and pgvector Operate with resilience, add semantic and result caching, run chaos tests, set retries and timeouts, and follow day two runbooks This is a code heavy guide, you get runnable Python clients, Shell scripts, Systemd units, YAML and JSON configs you can adapt to real projects. Grab your copy today and ship RAG with confidence Full Product DetailsAuthor: Emma VerranPublisher: Independently Published Imprint: Independently Published Dimensions: Width: 17.80cm , Height: 1.70cm , Length: 25.40cm Weight: 0.558kg ISBN: 9798273131392Pages: 322 Publication Date: 05 November 2025 Audience: General/trade , General Format: Paperback Publisher's Status: Active Availability: Available To Order We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately. Table of ContentsReviewsAuthor InformationTab Content 6Author Website:Countries AvailableAll regions |
||||