Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture (paperback)

Author:   Anthony Giordano
Publisher:   Pearson Education (US)
ISBN:  

9780133967371


Pages:   504
Publication Date:   12 August 2014
Format:   Paperback
Availability:   In Print   Availability explained
This item will be ordered in for you from one of our suppliers. Upon receipt, we will promptly dispatch it out to you. For in store availability, please contact us.

Our Price $81.95 Quantity:  
Add to Cart

Share |

Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture (paperback)


Add your own review!

Overview

Making Data Integration Work: How to Systematically Reduce Cost, Improve Quality, and Enhance Effectiveness   Today’s enterprises are investing massive resources in data integration. Many possess thousands of point-to-point data integration applications that are costly, undocumented, and difficult to maintain. Data integration now accounts for a major part of the expense and risk of typical data warehousing and business intelligence projects--and, as businesses increasingly rely on analytics, the need for a blueprint for data integration is increasing now more than ever.   This book presents the solution: a clear, consistent approach to defining, designing, and building data integration components to reduce cost, simplify management, enhance quality, and improve effectiveness. Leading IBM data management expert Tony Giordano brings together best practices for architecture, design, and methodology, and shows how to do the disciplined work of getting data integration right.   Mr. Giordano begins with an overview of the “patterns” of data integration, showing how to build blueprints that smoothly handle both operational and analytic data integration. Next, he walks through the entire project lifecycle, explaining each phase, activity, task, and deliverable through a complete case study. Finally, he shows how to integrate data integration with other information management disciplines, from data governance to metadata. The book’s appendices bring together key principles, detailed models, and a complete data integration glossary.   Coverage includes Implementing repeatable, efficient, and well-documented processes for integrating data Lowering costs and improving quality by eliminating unnecessary or duplicative data integrations Managing the high levels of complexity associated with integrating business and technical data Using intuitive graphical design techniques for more effective process and data integration modeling Building end-to-end data integration applications that bring together many complex data sources

Full Product Details

Author:   Anthony Giordano
Publisher:   Pearson Education (US)
Imprint:   Pearson
Dimensions:   Width: 10.00cm , Height: 10.00cm , Length: 10.00cm
Weight:   0.100kg
ISBN:  

9780133967371


ISBN 10:   0133967379
Pages:   504
Publication Date:   12 August 2014
Audience:   College/higher education ,  Tertiary & Higher Education
Format:   Paperback
Publisher's Status:   Active
Availability:   In Print   Availability explained
This item will be ordered in for you from one of our suppliers. Upon receipt, we will promptly dispatch it out to you. For in store availability, please contact us.

Table of Contents

Preface xix Acknowledgments xxii About the Author xxiii Introduction: Why Is Data Integration Important? 1 Part 1 Overview of Data Integration 5 Chapter 1 Types of Data Integration 7 Data Integration Architectural Patterns 7 Enterprise Application Integration (EAI) 8 Service-Oriented Architecture (SOA) 9 Federation 12 Extract, Transform, Load (ETL) 14 Common Data Integration Functionality 15 Summary 16 End-of-Chapter Questions 16 Chapter 2 An Architecture for Data Integration 19 What Is Reference Architecture? 19 Reference Architecture for Data Integration 20 Objectives of the Data Integration Reference Architecture 21 The Data Subject Area-Based Component Design Approach 22 A Scalable Architecture 24 Purposes of the Data Integration Reference Architecture 26 The Layers of the Data Integration Architecture 26 Extract/Subscribe Processes 27 Data Integration Guiding Principle: “Read Once, Write Many” 28 Data Integration Guiding Principle: “Grab Everything” 28 Initial Staging Landing Zone 29 Data Quality Processes 31 What Is Data Quality? 31 Causes of Poor Data Quality 31 Data Quality Check Points 32 Where to Perform a Data Quality Check 32 Clean Staging Landing Zone 34 Transform Processes 35 Conforming Transform Types 35 Calculations and Splits Transform Types 35 Processing and Enrichment Transform Types 36 Target Filters Transform Types 38 Load-Ready Publish Landing Zone 39 Load/Publish Processes 40 Physical Load Architectures 41 An Overall Data Architecture 41 Summary 42 End-of-Chapter Questions 43 Chapter 3 A Design Technique: Data Integration Modeling 45 The Business Case for a New Design Process 45 Improving the Development Process 47 Leveraging Process Modeling for Data Integration 48 Overview of Data Integration Modeling 48 Modeling to the Data Integration Architecture 48 Data Integration Models within the SDLC 49 Structuring Models on the Reference Architecture 50 Conceptual Data Integration Models 51 Logical Data Integration Models 51 High-Level Logical Data Integration Model 52 Logical Extraction Data Integration Models 52 Logical Data Quality Data Integration Models 53 Logical Transform Data Integration Models 54 Logical Load Data Integration Models 55 Physical Data Integration Models 56 Converting Logical Data Integration Models to Physical Data Integration Models 56 Target-Based Data Integration Design Technique Overview 56 Physical Source System Data Integration Models 57 Physical Common Component Data Integration Models 58 Physical Subject Area Load Data Integration Models 60 Logical Versus Physical Data Integration Models 61 Tools for Developing Data Integration Models 61 Industry-Based Data Integration Models 63 Summary 64 End-of-Chapter Questions 65 Chapter 4 Case Study: Customer Loan Data Warehouse Project 67 Case Study Overview 67 Step 1: Build a Conceptual Data Integration Model 69 Step 2: Build a High-Level Logical Model Data Integration Model 70 Step 3: Build the Logical Extract DI Models 72 Confirm the Subject Area Focus from the Data Mapping Document 73 Review Whether the Existing Data Integration Environment Can Fulfill the Requirements 74 Determine the Business Extraction Rules 74 Control File Check Processing 74 Complete the Logical Extract Data Integration Models 74 Final Thoughts on Designing a Logical Extract DI Model 76 Step 4: Define a Logical Data Quality DI Model 76 Design a Logical Data Quality Data Integration Model 77 Identify Technical and Business Data Quality Criteria 77 Determine Absolute and Optional Data Quality Criteria 80 Step 5: Define the Logical Transform DI Model 81 Step 6: Define the Logical Load DI Model 85 Step 7: Determine the Physicalization Strategy 87 Step 8: Convert the Logical Extract Models into Physical Source System Extract DI Models 88 Step 9: Refine the Logical Load Models into Physical Source System Subject Area Load DI Models 90 Step 10: Package the Enterprise Business Rules into Common Component Models 92 Step 11: Sequence the Physical DI Models 94 Summary 95 Part 2 The Data Integration Systems Development Life Cycle 97 Chapter 5 Data Integration Analysis 99 Analyzing Data Integration Requirements 100 Building a Conceptual Data Integration Model 101 Key Conceptual Data Integration Modeling Task Steps 102 Why Is Source System Data Discovery So Difficult? 103 Performing Source System Data Profiling 104 Overview of Data Profiling 104 Key Source System Data Profiling Task Steps 105 Reviewing/Assessing Source Data Quality 109 Validation Checks to Assess the Data 109 Key Review/Assess Source Data Quality Task Steps 111 Performing Source\Target Data Mappings 111 Overview of Data Mapping 112 Types of Data Mapping 113 Key Source\Target Data Mapping Task Steps 115 Summary 116 End-of-Chapter Questions 116 Chapter 6 Data Integration Analysis Case Study 117 Case Study Overview 117 Envisioned Wheeler Data Warehouse Environment 118 Aggregations in a Data Warehouse Environment 120 Data Integration Analysis Phase 123 Step 1: Build a Conceptual Data Integration Model 123 Step 2: Perform Source System Data Profiling 124 Step 3: Review/Assess Source Data Quality 130 Step 4: Perform Source\Target Data Mappings 135 Summary 145 Chapter 7 Data Integration Logical Design 147 Determining High-Level Data Volumetrics 147 Extract Sizing 148 Disk Space Sizing 148 File Size Impacts Component Design 150 Key Data Integration Volumetrics Task Steps 150 Establishing a Data Integration Architecture 151 Identifying Data Quality Criteria 154 Examples of Data Quality Criteria from a Target 155 Key Data Quality Criteria Identification Task Steps 155 Creating Logical Data Integration Models 156 Key Logical Data Integration Model Task Steps 157 Defining One-Time Data Conversion Load Logical Design 163 Designing a History Conversion 164 One-Time History Data Conversion Task Steps 166 Summary 166 End-of-Chapter Questions 167 Chapter 8 Data Integration Logical Design Case Study 169 Step 1: Determine High-Level Data Volumetrics 169 Step 2: Establish the Data Integration Architecture 174 Step 3: Identify Data Quality Criteria 177 Step 4: Create Logical Data Integration Models 180 Define the High-Level Logical Data Integration Model 181 Define the Logical Extraction Data Integration Model 183 Define the Logical Data Quality Data Integration Model 187 Define Logical Transform Data Integration Model 190 Define Logical Load Data Integration Model 191 Define Logical Data Mart Data Integration Model 192 Develop the History Conversion Design 195 Summary 198 Chapter 9 Data Integration Physical Design 199 Creating Component-Based Physical Designs 200 Reviewing the Rationale for a Component-Based Design 200 Modularity Design Principles 200 Key Component-Based Physical Designs Creation Task Steps 201 Preparing the DI Development Environment 201 Key Data Integration Development Environment Preparation Task Steps 202 Creating Physical Data Integration Models 203 Point-to-Point Application Development--The Evolution of Data Integration Development 203 The High-Level Logical Data Integration Model in Physical Design 205 Design Physical Common Components Data Integration Models 206 Design Physical Source System Extract Data Integration Models 208 Design Physical Subject Area Load Data Integration Models 209 Designing Parallelism into the Data Integration Models 210 Types of Data Integration Parallel Processing 211 Other Parallel Processing Design Considerations 214 Parallel Processing Pitfalls 215 Key Parallelism Design Task Steps 216 Designing Change Data Capture 216 Append Change Data Capture Design Complexities 217 Key Change Data Capture Design Task Steps 219 Finalizing the History Conversion Design 220 From Hypothesis to Fact 220 Finalize History Data Conversion Design Task Steps 220 Defining Data Integration Operational Requirements 221 Determining a Job Schedule for the Data Integration Jobs 221 Determining a Production Support Team 222 Key Data Integration Operational Requirements Task Steps 224 Designing Data Integration Components for SOA 225 Leveraging Traditional Data Integration Processes as SOA Services 225 Appropriate Data Integration Job Types 227 Key Data Integration Design for SOA Task Steps 227 Summary 228 End-of-Chapter Questions 228 Chapter 10 Data Integration Physical Design Case Study 229 Step 1: Create Physical Data Integration Models 229 Instantiating the Logical Data Integration Models into a Data Integration Package 229 Step 2: Find Opportunities to Tune through Parallel Processing 237 Step 3: Complete Wheeler History Conversion Design 238 Step 4: Define Data Integration Operational Requirements 239 Developing a Job Schedule for Wheeler 240 The Wheeler Monthly Job Schedule 240 The Wheeler Monthly Job Flow 240 Process Step 1: Preparation for the EDW Load Processing 241 Process Step 2: Source System to Subject Area File Processing 242 Process Step 3: Subject Area Files to EDW Load Processing 245 Process Step 4: EDW-to-Product Line Profitability Data Mart Load Processing 248 Production Support Staffing 248 Summary 249 Chapter 11 Data Integration Development Cycle 251 Performing General Data Integration Development Activities 253 Data Integration Development Standards 253 Error-Handling Requirements 255 Naming Standards 255 Key General Development Task Steps 256 Prototyping a Set of Data Integration Functionality 257 The Rationale for Prototyping 257 Benefits of Prototyping 257 Prototyping Example 258 Key Data Integration Prototyping Task Steps 261 Completing/Extending Data Integration Job Code 262 Complete/Extend Common Component Data Integration Jobs 263 Complete/Extend the Source System Extract Data Integration Jobs 264 Complete/Extend the Subject Area Load Data Integration Jobs 265 Performing Data Integration Testing 266 Data Warehousing Testing Overview 267 Types of Data Warehousing Testing 268 Perform Data Warehouse Unit Testing 269 Perform Data Warehouse Integration Testing 272 Perform Data Warehouse System and Performance Testing 273 Perform Data Warehouse User Acceptance Testing 274 The Role of Configuration Management in Data Integration 275 What Is Configuration Management? 276 Data Integration Version Control 277 Data Integration Software Promotion Life Cycle 277 Summary 277 End-of-Chapter Questions 278 Chapter 12 Data Integration Development Cycle Case Study 279 Step 1: Prototype the Common Customer Key 279 Step 2: Develop User Test Cases 283 Domestic OM Source System Extract Job Unit Test Case 284 Summary 287 Part 3 Data Integration with Other Information Management Disciplines 289 Chapter 13 Data Integration and Data Governance 291 What Is Data Governance? 292 Why Is Data Governance Important? 294 Components of Data Governance 295 Foundational Data Governance Processes 295 Data Governance Organizational Structure 298 Data Stewardship Processes 304 Data Governance Functions in Data Warehousing 305 Compliance in Data Governance 309 Data Governance Change Management 310 Summary 311 End-of-Chapter Questions 311 Chapter 14 Metadata 313 What Is Metadata? 313 The Role of Metadata in Data Integration 314 Categories of Metadata 314 Business Metadata 315 Structural Metadata 315 Navigational Metadata 317 Analytic Metadata 318 Operational Metadata 319 Metadata as Part of a Reference Architecture 319 Metadata Users 320 Managing Metadata 321 The Importance of Metadata Management in Data Governance 321 Metadata Environment Current State 322 Metadata Management Plan 322 Metadata Management Life Cycle 324 Summary 327 End-of-Chapter Questions 327 Chapter 15 Data Quality 329 The Data Quality Framework 330 Key Data Quality Elements 331 The Technical Data Quality Dimension 332 The Business-Process Data Quality Dimension 333 Types of Data Quality Processes 334 The Data Quality Life Cycle 334 The Define Phase 336 Defining the Data Quality Scope 336 Identifying/Defining the Data Quality Elements 336 Developing Preventive Data Quality Processes 337 The Audit Phase 345 Developing a Data Quality Measurement Process 346 Developing Data Quality Reports 348 Auditing Data Quality by LOB or Subject Area 350 The Renovate Phase 351 Data Quality Assessment and Remediation Projects 352 Data Quality SWAT Renovation Projects 352 Data Quality Programs 353 Final Thoughts on Data Quality 353 Summary 353 End-of-Chapter Questions 354 Appendix A Exercise Answers 355 Appendix B Data Integration Guiding Principles 369 Write Once, Read Many 369 Grab Everything 369 Data Quality before Transforms 369 Transformation Componentization 370 Where to Perform Aggregations and Calculations 370 Data Integration Environment Volumetric Sizing 370 Subject Area Volumetric Sizing 370 Appendix C Glossary 371 Appendix D Case Study Models Appendix D is an online-only appendix. Print-book readers can download the appendix at www.ibmpressbooks.com/title/9780137084937. For eBook editions, the appendix is included in the book. Index 375

Reviews

Author Information

Anthony Giordano is a partner in IBM’s Business Analytics and Optimization Consulting Practice and currently leads the Enterprise Information Management Service Line that focuses on data modeling, data integration, master data management, and data governance. He has more than 20 years of experience in the Information Technology field with a focus in the areas of business intelligence, data warehousing, and Information Management. In his spare time, he has taught classes in data warehousing and project management at the undergraduate and graduate levels at several local colleges and universities.

Tab Content 6

Author Website:  

Customer Reviews

Recent Reviews

No review item found!

Add your own review!

Countries Available

All regions
Latest Reading Guide

MRG2025CC

 

Shopping Cart
Your cart is empty
Shopping cart
Mailing List