Vectorization: A Practical Guide to Efficient Implementations of Machine Learning Algorithms

Author: Edward DongBo Cui (Case Western Reserve, USA)
Publisher: John Wiley & Sons Inc
ISBN:

9781394272945

Pages: 448
Publication Date: 10 December 2024
Format: Hardback
Availability: Out of stock

The supplier is temporarily out of stock of this item. It will be ordered for you on backorder and shipped when it becomes available.

Our Price $232.95 Quantity:

Share |

Vectorization: A Practical Guide to Efficient Implementations of Machine Learning Algorithms

Author Information

Overview

Full Product Details

Author: Edward DongBo Cui (Case Western Reserve, USA)
Publisher: John Wiley & Sons Inc
Imprint: Wiley-IEEE Press
Weight: 0.862kg
ISBN:

9781394272945

ISBN 10: 1394272944
Pages: 448
Publication Date: 10 December 2024
Audience: Professional and scholarly , Professional & Vocational
Format: Hardback
Publisher's Status: Active
Availability: Out of stock

The supplier is temporarily out of stock of this item. It will be ordered for you on backorder and shipped when it becomes available.

About the Author xiii Preface xv Acknowledgment xix 1 Introduction to Vectorization 1 1.1 What Is Vectorization 1 1.1.1 A Simple Example of Vectorization in Action 2 1.1.2 Python Can Still Be Faster! 3 1.1.3 Memory Allocation of Vectorized Operations 4 1.2 Case Study: Dense Layer of a Neural Network 6 1.3 Vectorization vs. Other Parallel Computing Paradigms 9 1.3.1 Multithreading 9 1.3.2 Multiprocessing 9 1.3.3 Multiworker Distributed Computing 13 Bibliography 16 2 Basic Tensor Operations 19 2.1 Tensor Initializers 19 2.2 Data Type and Casting 24 2.2.1 Tips on Specifying the dtypes During Tensor Initialization 27 2.2.2 Tips on Casting 27 2.3 Mathematical Operations 27 2.4 Reduction Operations 31 2.5 Value Comparison Operations 31 2.6 Logical Operations 32 2.7 Ordered Array-Adjacent Element Operations 33 2.8 Array Reversing 33 2.9 Concatenation, Stacking, and Splitting 35 2.10 Reshaping 35 2.11 Broadcasting 38 2.12 Case Studies 44 2.12.1 Image Normalization 45 2.12.2 Pearson’s Correlation 46 2.12.3 Pair-wise Difference 47 2.12.4 Construction of Magic Squares 48 Bibliography 57 3 Tensor Indexing 61 3.1 Get Values at Index 61 3.1.1 Integer Indexing 61 3.1.2 Flat Index vs. Multi-index 63 3.1.3 Boolean Indexing 69 3.2 Slicing 70 3.2.1 Reusing Slice Configuration 75 3.3 Case Study: Get Consecutive Index 78 3.4 Take and Gather 80 3.4.1 Take 80 3.4.2 Take Along Axis 83 3.4.3 Gather 87 3.4.4 N-Dimensional Gather 91 3.5 Assign Values at Index 95 3.6 Put and Scatter 98 3.6.1 Put 98 3.6.2 Put Along Axis 100 3.6.3 Multi-index Scatter Replacement 101 3.6.4 Additional Scatter Operations from PyTorch 108 3.7 Case Study: Batchwise Scatter Values 113 Bibliography 115 4 Linear Algebra 119 4.1 Tensor Multiplications 119 4.2 The matmul Operation 119 4.2.1 The @ Operator 122 4.3 The tensordot Operation 123 4.3.1 Heuristics of tensordot Operations 125 4.4 Einsum 129 4.5 Case Study: Pair-wise Pearson’s Cross-Correlation 134 4.6 Case Study: Hausdorff Distance 135 4.7 Common Linear Algebraic Routines 139 4.8 Case Study: Fitting Single Exponential Curves 139 Bibliography 144 5 Masking and Padding 145 5.1 Masking 145 5.1.1 Triangular and Diagonal Masks 146 5.1.2 Changing Elements Using the where Operation 146 5.1.3 Use Multiplication to Apply Masks 146 5.1.4 Use Arithmetic Operations as Boolean Operations to Apply and Combine Masks 151 5.1.5 Select Elements Based on Masking 152 5.1.6 Case Study: Top-k Masking 153 5.2 Padding 155 5.2.1 Case Study: Padding in Convolutional Neural Networks 161 5.2.2 Case Study: Truncate or Pad Sequence to Desired Length 163 5.3 Advanced Case Studies 164 5.3.1 Scaled-Dot Product Attention 164 5.3.2 Variable-Length Range via Masking 168 5.3.3 Length Regulator Module of FastSpeech 2 171 Bibliography 181 6 String Processing 183 6.1 String Data Types 183 6.1.1 NumPy String, Bytes, and Object 183 6.1.2 Pandas String 184 6.1.3 Tensorflow Bytes 186 6.1.4 PyTorch 187 6.2 String Operations 187 6.3 Case Study: Parsing DateTime from String Representations 189 6.4 Mapping Strings to Indices 194 6.4.1 NumPy np.unique 194 6.4.2 Pandas pd.Categorical 195 6.4.3 Scikit-learn sklearn.preprocessing.LabelEncoder 198 6.4.4 Tensorflow tf.lookup 198 6.4.5 TorchText torchtext.vocab 200 6.5 Case Study: Factorization Machine 201 6.5.1 Factorization Machine Model 202 6.5.2 More Efficient Optimization Criterion 202 6.5.3 Implementation of Deep Factorization Machine in Tensorflow 203 6.5.4 Training DeepFM on MovieLens 1M Dataset 209 6.6 Regular Expressions (Regex) 215 6.7 Data Serialization and Deserialization 217 Bibliography 221 7 Sparse Matrix 223 7.1 Scipy’s Sparse Matrix Classes 224 7.1.1 Coordinate Sparse Matrix (coo_matrix) 224 7.1.2 Compressed Sparse Column Matrix (csc_matrix) 225 7.1.3 Compressed Sparse Row Matrix (csr_matrix) 227 7.1.4 Block Sparse Row Matrix (bsr_matrix) 228 7.1.5 Dictionary of Keys Sparse Matrix (dok_matrix) 229 7.1.6 Row-Based List of List Sparse Matrix (lil_matrix) 230 7.1.7 Diagonal Storage Sparse Matrix (dia_matrix) 232 7.1.8 Comparisons Between Different Sparse Matrix Formats 233 7.2 Sparse Matrix Broadcasting 235 7.2.1 Scalar Broadcasting 235 7.2.2 Row-wise Broadcasting 236 7.2.3 Column-wise Broadcasting 237 7.2.4 Multiplication on Sparse Indices 237 7.3 Tensorflow’s Sparse Tensors 238 7.3.1 SparseTensor Class 239 7.3.2 Sparse CSR Matrix 240 7.4 PyTorch’s Sparse Matrix 242 7.5 Sparse Matrix in Other Python Libraries 245 7.6 When (Not) to Use Sparse Matrix 245 7.7 Case Study: Sparse Matrix Factorization with ALS 245 7.7.1 Matrix Factorization 246 7.7.2 Parameter Updates with ALS 246 7.7.3 Adding Bias Terms to Matrix Factorization 247 7.7.4 Adding Regularization Term 249 7.7.5 Implementing ALS 250 7.7.6 Training a Model with MovieLens-100k 255 Bibliography 257 8 Jagged Tensors 261 8.1 Left Align a Sparse Tensor to Represent Ragged Tensor 263 8.2 Index to Binary Indicator 269 8.3 Case Study: Jaccard Similarities Using Sparse Matrix 271 8.4 Case Study: Batchwise Set Operations 275 8.5 Case Study: Autoencoders with Sparse Inputs 283 8.5.1 Embedding Lookup on Sparse Inputs 284 8.5.2 Inputs with Weights 287 Bibliography 293 9 Groupby, Apply, and Aggregate 295 9.1 Pandas Groupwise Operations 296 9.2 Reshaping and Windowing of Dense Tensors 298 9.3 Case Study: Vision Transformer (ViT) 305 9.4 Bucketizing Values 315 9.5 Segment-wise Aggregation 319 9.6 Case Study: EmbeddingBag 325 9.7 Case Study: Vocal Duration Constrained by Note Duration 330 9.8 Case Study: Filling of Missing Values in a Sequence 336 Bibliography 341 10 Sorting and Reordering 343 10.1 Sorting Operations 343 10.2 Case Study: Top-k Using argsort and argpartition 346 10.3 Case Study: Sort the Rows of a Matrix 349 10.4 Case Study: Reverse Padded Sequence 353 10.5 Case Study: Gumbel-Max Sampling with Weights 358 10.6 Case Study: Sorting Articles Around Anchored Advertisements 367 Bibliography 373 11 Building a Language Model from Scratch 375 11.1 Language Modeling with Transformer 375 11.1.1 Encoder and Decoder of the Transformer Architecture 376 11.1.2 Training of Transformer Models 377 11.2 Pre-LN vs. Post-LN Transformer 378 11.3 Layer Normalization 383 11.4 Positional Encoding and Embedding 385 11.4.1 Sinusoidal Positional Encoding 385 11.4.2 Position as Categorical Embeddings 387 11.4.3 Relative Positional Encoding (RPE) 388 11.4.4 Rotary Positional Encoding (RoPE) 389 11.5 Activation Functions in Feedforward Layer 395 11.6 Case Study: Training a Tiny LLaMA Model for Next Token Prediction 398 11.7 A Word on AI Safety and Alignment 410 11.8 Concluding Remarks 412 Bibliography 412 Index 419

Reviews

Author Information

Edward DongBo Cui is a Data Science and Machine Learning Engineering Leader who holds a PhD in Neuroscience from Case Western Reserve University, USA. Edward served as Director of Data Science at NBC Universal, building the first recommendation system on the new Peacock streaming platform. Previously, he was Lead Data Scientist at Nielsen Global Media. He is an expert in ML engineering, research, and MLOps to drive data-centric decision-making and enhance product innovation.

Tab Content 6

Author Website:

Countries Available

All regions

Latest Reading Guide

Shopping Cart

Your cart is empty

Mailing List

Vectorization: A Practical Guide to Efficient Implementations of Machine Learning Algorithms

9781394272945

Availability Information

Overview

Full Product Details

9781394272945

Table of Contents

Reviews

Author Information

Tab Content 6

Countries Available

Sign up now