Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture

Author:   Anthony Giordano
Publisher:   Pearson Education (US)
ISBN:  

9780137084937


Pages:   416
Publication Date:   27 January 2011
Format:   Hardback
Availability:   In Print   Availability explained
Limited stock is available. It will be ordered for you and shipped pending supplier's limited stock.

Our Price $158.37 Quantity:  
Add to Cart

Share |

Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture


Add your own review!

Overview

Making Data Integration Work: How to Systematically Reduce Cost, Improve Quality, and Enhance Effectiveness   Today’s enterprises are investing massive resources in data integration. Many possess thousands of point-to-point data integration applications that are costly, undocumented, and difficult to maintain. Data integration now accounts for a major part of the expense and risk of typical data warehousing and business intelligence projects--and, as businesses increasingly rely on analytics, the need for a blueprint for data integration is increasing now more than ever.   This book presents the solution: a clear, consistent approach to defining, designing, and building data integration components to reduce cost, simplify management, enhance quality, and improve effectiveness. Leading IBM data management expert Tony Giordano brings together best practices for architecture, design, and methodology, and shows how to do the disciplined work of getting data integration right.   Mr. Giordano begins with an overview of the “patterns” of data integration, showing how to build blueprints that smoothly handle both operational and analytic data integration. Next, he walks through the entire project lifecycle, explaining each phase, activity, task, and deliverable through a complete case study. Finally, he shows how to integrate data integration with other information management disciplines, from data governance to metadata. The book’s appendices bring together key principles, detailed models, and a complete data integration glossary.   Coverage includes Implementing repeatable, efficient, and well-documented processes for integrating data Lowering costs and improving quality by eliminating unnecessary or duplicative data integrations Managing the high levels of complexity associated with integrating business and technical data Using intuitive graphical design techniques for more effective process and data integration modeling Building end-to-end data integration applications that bring together many complex data sources

Full Product Details

Author:   Anthony Giordano
Publisher:   Pearson Education (US)
Imprint:   IBM Press
Dimensions:   Width: 18.60cm , Height: 2.70cm , Length: 23.20cm
Weight:   0.780kg
ISBN:  

9780137084937


ISBN 10:   0137084935
Pages:   416
Publication Date:   27 January 2011
Audience:   College/higher education ,  Tertiary & Higher Education
Format:   Hardback
Publisher's Status:   Out of Print
Availability:   In Print   Availability explained
Limited stock is available. It will be ordered for you and shipped pending supplier's limited stock.

Table of Contents

Preface     xix Acknowledgments     xxii About the Author     xxiii Introduction: Why Is Data Integration Important?      1 Part 1 Overview of Data Integration     5 Chapter 1 Types of Data Integration     7 Data Integration Architectural Patterns     7 Enterprise Application Integration (EAI)      8 Service-Oriented Architecture (SOA)      9 Federation     12 Extract, Transform, Load (ETL)      14 Common Data Integration Functionality      15 Summary     16 End-of-Chapter Questions     16 Chapter 2 An Architecture for Data Integration     19 What Is Reference Architecture?      19 Reference Architecture for Data Integration     20 Objectives of the Data Integration Reference Architecture     21 The Data Subject Area-Based Component Design Approach     22 A Scalable Architecture     24 Purposes of the Data Integration Reference Architecture     26 The Layers of the Data Integration Architecture     26 Extract/Subscribe Processes     27 Data Integration Guiding Principle: “Read Once, Write Many”      28 Data Integration Guiding Principle: “Grab Everything”      28 Initial Staging Landing Zone     29 Data Quality Processes 31 What Is Data Quality?      31 Causes of Poor Data Quality     31 Data Quality Check Points     32 Where to Perform a Data Quality Check     32 Clean Staging Landing Zone     34 Transform Processes     35 Conforming Transform Types     35 Calculations and Splits Transform Types     35 Processing and Enrichment Transform Types     36 Target Filters Transform Types     38 Load-Ready Publish Landing Zone     39 Load/Publish Processes     40 Physical Load Architectures     41 An Overall Data Architecture     41 Summary     42 End-of-Chapter Questions     43 Chapter 3 A Design Technique: Data Integration Modeling     45 The Business Case for a New Design Process     45 Improving the Development Process     47 Leveraging Process Modeling for Data Integration     48 Overview of Data Integration Modeling     48 Modeling to the Data Integration Architecture     48 Data Integration Models within the SDLC     49 Structuring Models on the Reference Architecture     50 Conceptual Data Integration Models     51 Logical Data Integration Models     51 High-Level Logical Data Integration Model     52 Logical Extraction Data Integration Models     52 Logical Data Quality Data Integration Models     53 Logical Transform Data Integration Models     54 Logical Load Data Integration Models     55 Physical Data Integration Models     56 Converting Logical Data Integration Models to Physical Data Integration Models     56 Target-Based Data Integration Design Technique Overview     56 Physical Source System Data Integration Models     57 Physical Common Component Data Integration Models     58 Physical Subject Area Load Data Integration Models     60 Logical Versus Physical Data Integration Models     61 Tools for Developing Data Integration Models     61 Industry-Based Data Integration Models     63 Summary     64 End-of-Chapter Questions     65 Chapter 4 Case Study: Customer Loan Data Warehouse Project     67 Case Study Overview     67 Step 1: Build a Conceptual Data Integration Model     69 Step 2: Build a High-Level Logical Model Data Integration Model     70 Step 3: Build the Logical Extract DI Models     72 Confirm the Subject Area Focus from the Data Mapping Document     73 Review Whether the Existing Data Integration Environment Can Fulfill the Requirements     74 Determine the Business Extraction Rules     74 Control File Check Processing     74 Complete the Logical Extract Data Integration Models     74 Final Thoughts on Designing a Logical Extract DI Model     76 Step 4: Define a Logical Data Quality DI Model      76 Design a Logical Data Quality Data Integration Model     77 Identify Technical and Business Data Quality Criteria     77 Determine Absolute and Optional Data Quality Criteria     80 Step 5: Define the Logical Transform DI Model     81 Step 6: Define the Logical Load DI Model     85 Step 7: Determine the Physicalization Strategy     87 Step 8: Convert the Logical Extract Models into Physical Source System Extract DI Models     88 Step 9: Refine the Logical Load Models into Physical Source System Subject Area Load DI Models     90 Step 10: Package the Enterprise Business Rules into Common Component Models     92 Step 11: Sequence the Physical DI     Models 94 Summary     95 Part 2 The Data Integration Systems Development Life Cycle     97 Chapter 5 Data Integration Analysis     99 Analyzing Data Integration Requirements     100 Building a Conceptual Data Integration Model     101 Key Conceptual Data Integration Modeling Task Steps     102 Why Is Source System Data Discovery So Difficult?      103 Performing Source System Data Profiling     104 Overview of Data Profiling    104 Key Source System Data Profiling Task Steps     105 Reviewing/Assessing Source Data Quality     109 Validation Checks to Assess the Data     109 Key Review/Assess Source Data Quality Task Steps     111 Performing Source\Target Data Mappings     111 Overview of Data Mapping     112 Types of Data Mapping     113 Key Source\Target Data Mapping Task Steps     115 Summary     116 End-of-Chapter Questions     116 Chapter 6 Data Integration Analysis Case Study     117 Case Study Overview     117 Envisioned Wheeler Data Warehouse Environment     118 Aggregations in a Data Warehouse Environment     120 Data Integration Analysis Phase     123 Step 1: Build a Conceptual Data Integration Model     123 Step 2: Perform Source System Data Profiling     124 Step 3: Review/Assess Source Data Quality     130 Step 4: Perform Source\Target Data Mappings     135 Summary     145 Chapter 7 Data Integration Logical Design     147 Determining High-Level Data Volumetrics     147 Extract Sizing     148 Disk Space Sizing     148 File Size Impacts Component Design     150 Key Data Integration Volumetrics Task Steps     150 Establishing a Data Integration Architecture     151 Identifying Data Quality Criteria     154 Examples of Data Quality Criteria from a Target     155 Key Data Quality Criteria Identification Task Steps     155 Creating Logical Data Integration Models     156 Key Logical Data Integration Model Task Steps     157 Defining One-Time Data Conversion Load Logical Design     163 Designing a History Conversion     164 One-Time History Data Conversion Task Steps     166 Summary     166 End-of-Chapter Questions     167 Chapter 8 Data Integration Logical Design Case Study     169 Step 1: Determine High-Level Data Volumetrics     169 Step 2: Establish the Data Integration Architecture     174 Step 3: Identify Data Quality Criteria     177 Step 4: Create Logical Data Integration Models     180 Define the High-Level Logical Data Integration Model     181 Define the Logical Extraction Data Integration Model     183 Define the Logical Data Quality Data Integration Model     187 Define Logical Transform Data Integration Model     190 Define Logical Load Data Integration Model     191 Define Logical Data Mart Data Integration Model     192 Develop the History Conversion Design     195 Summary     198 Chapter 9 Data Integration Physical Design     199 Creating Component-Based Physical Designs     200 Reviewing the Rationale for a Component-Based Design     200 Modularity Design Principles     200 Key Component-Based Physical Designs Creation Task Steps     201 Preparing the DI Development Environment     201 Key Data Integration Development Environment Preparation Task Steps     202 Creating Physical Data Integration Models      203 Point-to-Point Application Development--The Evolution of Data Integration Development     203 The High-Level Logical Data Integration Model in Physical Design     205 Design Physical Common Components Data Integration Models     206 Design Physical Source System Extract Data Integration Models     208 Design Physical Subject Area Load Data Integration Models     209 Designing Parallelism into the Data Integration Models     210 Types of Data Integration Parallel Processing     211 Other Parallel Processing Design Considerations     214 Parallel Processing Pitfalls     215 Key Parallelism Design Task Steps     216 Designing Change Data Capture     216 Append Change Data Capture Design Complexities     217 Key Change Data Capture Design Task Steps      219 Finalizing the History Conversion Design     220 From Hypothesis to Fact     220 Finalize History Data Conversion Design Task Steps     220 Defining Data Integration Operational Requirements     221 Determining a Job Schedule for the Data Integration Jobs     221 Determining a Production Support Team     222 Key Data Integration Operational Requirements Task Steps     224 Designing Data Integration Components for SOA     225 Leveraging Traditional Data Integration Processes as SOA Services     225 Appropriate Data Integration Job Types     227 Key Data Integration Design for SOA Task Steps     227 Summary     228 End-of-Chapter Questions     228 Chapter 10 Data Integration Physical Design Case Study     229 Step 1: Create Physical Data Integration Models     229 Instantiating the Logical Data Integration Models into a Data Integration Package     229 Step 2: Find Opportunities to Tune through Parallel Processing     237 Step 3: Complete Wheeler History Conversion Design     238 Step 4: Define Data Integration Operational Requirements     239 Developing a Job Schedule for Wheeler     240 The Wheeler Monthly Job Schedule     240 The Wheeler Monthly Job Flow     240 Process Step 1: Preparation for the EDW Load Processing     241 Process Step 2: Source System to Subject Area File Processing     242 Process Step 3: Subject Area Files to EDW Load Processing     245 Process Step 4: EDW-to-Product Line Profitability Data Mart Load Processing     248 Production Support Staffing     248 Summary     249 Chapter 11 Data Integration Development Cycle     251 Performing General Data Integration Development Activities     253 Data Integration Development Standards     253 Error-Handling Requirements     255 Naming Standards     255 Key General Development Task Steps     256 Prototyping a Set of Data Integration Functionality     257 The Rationale for Prototyping     257 Benefits of Prototyping     257 Prototyping Example     258 Key Data Integration Prototyping Task Steps     261 Completing/Extending Data Integration Job Code     262 Complete/Extend Common Component Data Integration Jobs     263 Complete/Extend the Source System Extract Data Integration Jobs     264 Complete/Extend the Subject Area Load Data Integration Jobs     265 Performing Data Integration Testing     266 Data Warehousing Testing Overview     267 Types of Data Warehousing Testing     268 Perform Data Warehouse Unit Testing     269 Perform Data Warehouse Integration Testing     272 Perform Data Warehouse System and Performance Testing     273 Perform Data Warehouse User Acceptance Testing     274 The Role of Configuration Management in Data Integration     275 What Is Configuration Management?      276 Data Integration Version Control     277 Data Integration Software Promotion Life Cycle     277 Summary     277 End-of-Chapter Questions     278 Chapter 12 Data Integration Development Cycle Case Study     279 Step 1: Prototype the Common Customer Key     279 Step 2: Develop User Test Cases     283 Domestic OM Source System Extract Job Unit Test Case     284 Summary     287 Part 3 Data Integration with Other Information Management Disciplines     289 Chapter 13 Data Integration and Data Governance     291 What Is Data Governance?      292 Why Is Data Governance Important?      294 Components of Data Governance     295 Foundational Data Governance Processes     295 Data Governance Organizational Structure     298 Data Stewardship Processes     304 Data Governance Functions in Data Warehousing     305 Compliance in Data Governance     309 Data Governance Change Management     310 Summary     311 End-of-Chapter Questions     311 Chapter 14 Metadata     313 What Is Metadata?      313 The Role of Metadata in Data Integration     314 Categories of Metadata     314 Business Metadata     315 Structural Metadata     315 Navigational Metadata     317 Analytic Metadata     318 Operational Metadata     319 Metadata as Part of a Reference Architecture     319 Metadata Users     320 Managing Metadata     321 The Importance of Metadata Management in Data Governance     321 Metadata Environment Current State     322 Metadata Management Plan     322 Metadata Management Life Cycle     324 Summary     327 End-of-Chapter Questions     327 Chapter 15 Data Quality     329 The Data Quality Framework     330 Key Data Quality Elements     331 The Technical Data Quality Dimension     332 The Business-Process Data Quality Dimension     333 Types of Data Quality Processes     334 The Data Quality Life Cycle     334 The Define Phase     336 Defining the Data Quality Scope     336 Identifying/Defining the Data Quality Elements     336 Developing Preventive Data Quality Processes     337 The Audit Phase     345 Developing a Data Quality Measurement Process     346 Developing Data Quality Reports     348 Auditing Data Quality by LOB or Subject Area     350 The Renovate Phase     351 Data Quality Assessment and Remediation Projects     352 Data Quality SWAT Renovation Projects     352 Data Quality Programs     353 Final Thoughts on Data Quality     353 Summary     353 End-of-Chapter Questions     354 Appendix A Exercise Answers     355 Appendix B Data Integration Guiding Principles     369 Write Once, Read Many     369 Grab Everything     369 Data Quality before Transforms     369 Transformation Componentization     370 Where to Perform Aggregations and Calculations     370 Data Integration Environment Volumetric Sizing     370 Subject Area Volumetric Sizing     370 Appendix C Glossary     371 Appendix D Case Study Models Appendix D is an online-only appendix. Print-book readers can download the appendix at www.ibmpressbooks.com/title/9780137084937. For eBook editions, the appendix is included in the book. Index     375

Reviews

Author Information

Anthony Giordano is a partner in IBM’s Business Analytics and Optimization Consulting Practice and currently leads the Enterprise Information Management Service Line that focuses on data modeling, data integration, master data management, and data governance. He has more than 20 years of experience in the Information Technology field with a focus in the areas of business intelligence, data warehousing, and Information Management. In his spare time, he has taught classes in data warehousing and project management at the undergraduate and graduate levels at several local colleges and universities.

Tab Content 6

Author Website:  

Customer Reviews

Recent Reviews

No review item found!

Add your own review!

Countries Available

All regions
Latest Reading Guide

MRG2025CC

 

Shopping Cart
Your cart is empty
Shopping cart
Mailing List