|
![]() |
|||
|
||||
OverviewMultilingual Natural Language Processing Applications is the first comprehensive single-source guide to building robust and accurate multilingual NLP systems. Edited by two leading experts, it integrates cutting-edge advances with practical solutions drawn from extensive field experience. Part I introduces the core concepts and theoretical foundations of modern multilingual natural language processing, presenting today’s best practices for understanding word and document structure, analyzing syntax, modeling language, recognizing entailment, and detecting redundancy. Part II thoroughly addresses the practical considerations associated with building real-world applications, including information extraction, machine translation, information retrieval/search, summarization, question answering, distillation, processing pipelines, and more. This book contains important new contributions from leading researchers at IBM, Google, Microsoft, Thomson Reuters, BBN, CMU, University of Edinburgh, University of Washington, University of North Texas, and others. Coverage includes Core NLP problems, and today’s best algorithms for attacking them Processing the diverse morphologies present in the world’s languages Uncovering syntactical structure, parsing semantics, using semantic role labeling, and scoring grammaticality Recognizing inferences, subjectivity, and opinion polarity Managing key algorithmic and design tradeoffs in real-world applications Extracting information via mention detection, coreference resolution, and events Building large-scale systems for machine translation, information retrieval, and summarization Answering complex questions through distillation and other advanced techniques Creating dialog systems that leverage advances in speech recognition, synthesis, and dialog management Constructing common infrastructure for multiple multilingual text processing applications This book will be invaluable for all engineers, software developers, researchers, and graduate students who want to process large quantities of text in multiple languages, in any environment: government, corporate, or academic. Full Product DetailsAuthor: Daniel Bikel , Imed ZitouniPublisher: Pearson Education (US) Imprint: IBM Press Dimensions: Width: 18.80cm , Height: 3.80cm , Length: 23.80cm Weight: 1.120kg ISBN: 9780137151448ISBN 10: 0137151446 Pages: 640 Publication Date: 24 May 2012 Audience: College/higher education , Postgraduate, Research & Scholarly Format: Hardback Publisher's Status: Out of Print Availability: Awaiting stock ![]() Table of ContentsPreface xxi Acknowledgments xxv About the Authors xxvii Part I: In Theory 1 Chapter 1: Finding the Structure of Words 3 1.1 Words and Their Components 4 1.2 Issues and Challenges 8 1.3 Morphological Models 15 1.4 Summary 22 Chapter 2: Finding the Structure of Documents 29 2.1 Introduction 29 2.2 Methods 33 2.3 Complexity of the Approaches 40 2.4 Performances of the Approaches 41 2.5 Features 41 2.6 Processing Stages 48 2.7 Discussion 48 2.8 Summary 49 Chapter 3: Syntax 57 3.1 Parsing Natural Language 57 3.2 Treebanks: A Data-Driven Approach to Syntax 59 3.3 Representation of Syntactic Structure 63 3.4 Parsing Algorithms 70 3.5 Models for Ambiguity Resolution in Parsing 80 3.6 Multilingual Issues: What Is a Token? 87 3.7 Summary 92 Chapter 4: Semantic Parsing 97 4.1 Introduction 97 4.2 Semantic Interpretation 98 4.3 System Paradigms 101 4.4 Word Sense 102 4.5 Predicate-Argument Structure 118 4.6 Meaning Representation 147 4.7 Summary 152 Chapter 5: Language Modeling 169 5.1 Introduction 169 5.2 n-Gram Models 170 5.3 Language Model Evaluation 170 5.4 Parameter Estimation 171 5.5 Language Model Adaptation 176 5.6 Types of Language Models 178 5.7 Language-Specific Modeling Problems 188 5.8 Multilingual and Crosslingual Language Modeling 195 5.9 Summary 198 Chapter 6: Recognizing Textual Entailment 209 6.1 Introduction 209 6.2 The Recognizing Textual Entailment Task 210 6.3 A Framework for Recognizing Textual Entailment 219 6.4 Case Studies 238 6.5 Taking RTE Further 248 6.6 Useful Resources 252 6.7 Summary 253 Chapter 7: Multilingual Sentiment and Subjectivity Analysis 259 7.1 Introduction 259 7.2 Definitions 260 7.3 Sentiment and Subjectivity Analysis on English 262 7.4 Word- and Phrase-Level Annotations 264 7.5 Sentence-Level Annotations 270 7.6 Document-Level Annotations 272 7.7 What Works, What Doesn’t 274 7.8 Summary 277 Part II: In Practice 283 Chapter 8: Entity Detection and Tracking 285 8.1 Introduction 285 8.2 Mention Detection 287 8.3 Coreference Resolution 296 8.4 Summary 303 Chapter 9: Relations and Events 309 9.1 Introduction 309 9.2 Relations and Events 310 9.3 Types of Relations 311 9.4 Relation Extraction as Classification 312 9.5 Other Approaches to Relation Extraction 317 9.6 Events 320 9.7 Event Extraction Approaches 320 9.8 Moving Beyond the Sentence 323 9.9 Event Matching 323 9.10 Future Directions for Event Extraction 326 9.11 Summary 326 Chapter 10: Machine Translation 331 10.1 Machine Translation Today 331 10.2 Machine Translation Evaluation 332 10.3 Word Alignment 337 10.4 Phrase-Based Models 343 10.5 Tree-Based Models 350 10.6 Linguistic Challenges 354 10.7 Tools and Data Resources 356 10.8 Future Directions 358 10.9 Summary 359 Chapter 11: Multilingual Information Retrieval 365 11.1 Introduction 366 11.2 Document Preprocessing 366 11.3 Monolingual Information Retrieval 372 11.4 CLIR 378 11.5 MLIR 382 11.6 Evaluation in Information Retrieval 386 11.7 Tools, Software, and Resources 391 11.8 Summary 393 Chapter 12: Multilingual Automatic Summarization 397 12.1 Introduction 397 12.2 Approaches to Summarization 399 12.3 Evaluation 412 12.4 How to Build a Summarizer 420 12.5 Competitions and Datasets 424 12.6 Summary 426 Chapter 13: Question Answering 433 13.1 Introduction and History 433 13.2 Architectures 435 13.3 Source Acquisition and Preprocessing 437 13.4 Question Analysis 440 13.5 Search and Candidate Extraction 443 13.6 Answer Scoring 450 13.7 Crosslingual Question Answering 454 13.8 A Case Study 455 13.9 Evaluation 460 13.10 Current and Future Challenges 464 13.11 Summary and Further Reading 465 Chapter 14: Distillation 475 14.1 Introduction 475 14.2 An Example 476 14.3 Relevance and Redundancy 477 14.4 The Rosetta Consortium Distillation System 479 14.5 Other Distillation Approaches 488 14.6 Evaluation and Metrics 491 14.7 Summary 495 Chapter 15: Spoken Dialog Systems 499 15.1 Introduction 499 15.2 Spoken Dialog Systems 499 15.3 Forms of Dialog 509 15.4 Natural Language Call Routing 510 15.5 Three Generations of Dialog Applications 510 15.6 Continuous Improvement Cycle 512 15.7 Transcription and Annotation of Utterances 513 15.8 Localization of Spoken Dialog Systems 513 15.9 Summary 520 Chapter 16: Combining Natural Language Processing Engines 523 16.1 Introduction 523 16.2 Desired Attributes of Architectures for Aggregating Speech and NLP Engines 524 16.3 Architectures for Aggregation 527 16.4 Case Studies 531 16.5 Lessons Learned 540 16.6 Summary 542 16.7 Sample UIMA Code 542 Index 551ReviewsAuthor InformationDaniel M. Bikel is a senior research scientist at Google, developing new methods for NLP and speech recognition. While at IBM, he architected the distillation system for IBM’s GALE multilingual information extraction and question-answering system. While pursuing his doctorate at Penn, he built the first extensible multilingual syntactic parsing engine. Imed Zitouni is a senior research scientist at IBM. He has led IBM’s Arabic information extraction and data resources efforts since 2004. He previously led both DIALOCA’s Speech/NLP group and Bell Labs/ Alcatel-Lucent’s language modeling and call routing activities. His work involves machine translation, NLP, and spoken dialog systems. Tab Content 6Author Website:Countries AvailableAll regions |