Data Mining Tools for Malware Detection

Author:   Mehedy Masud (University of Texas at Dallas, Richardson, Texas, USA) ,  Latifur Khan (University of Texas at Dallas, Richardson, Texas, USA) ,  Bhavani Thuraisingham (The University of Texas at Dallas, USA) ,  Kim J. Andreasson
Publisher:   Taylor & Francis Inc
ISBN:  

9781439854549


Pages:   450
Publication Date:   07 December 2011
Format:   Hardback
Availability:   In Print   Availability explained
This item will be ordered in for you from one of our suppliers. Upon receipt, we will promptly dispatch it out to you. For in store availability, please contact us.

Our Price $168.00 Quantity:  
Add to Cart

Share |

Data Mining Tools for Malware Detection


Add your own review!

Overview

Although the use of data mining for security and malware detection is quickly on the rise, most books on the subject provide high-level theoretical discussions to the near exclusion of the practical aspects. Breaking the mold, Data Mining Tools for Malware Detection provides a step-by-step breakdown of how to develop data mining tools for malware detection. Integrating theory with practical techniques and experimental results, it focuses on malware detection applications for email worms, malicious code, remote exploits, and botnets. The authors describe the systems they have designed and developed: email worm detection using data mining, a scalable multi-level feature extraction technique to detect malicious executables, detecting remote exploits using data mining, and flow-based identification of botnet traffic by mining multiple log files. For each of these tools, they detail the system architecture, algorithms, performance results, and limitations. Discusses data mining for emerging applications, including adaptable malware detection, insider threat detection, firewall policy analysis, and real-time data mining Includes four appendices that provide a firm foundation in data management, secure systems, and the semantic web Describes the authors' tools for stream data mining From algorithms to experimental results, this is one of the few books that will be equally valuable to those in industry, government, and academia. It will help technologists decide which tools to select for specific applications, managers will learn how to determine whether or not to proceed with a data mining project, and developers will find innovative alternative designs for a range of applications.

Full Product Details

Author:   Mehedy Masud (University of Texas at Dallas, Richardson, Texas, USA) ,  Latifur Khan (University of Texas at Dallas, Richardson, Texas, USA) ,  Bhavani Thuraisingham (The University of Texas at Dallas, USA) ,  Kim J. Andreasson
Publisher:   Taylor & Francis Inc
Imprint:   Taylor & Francis Inc
Dimensions:   Width: 15.60cm , Height: 2.50cm , Length: 23.40cm
Weight:   0.748kg
ISBN:  

9781439854549


ISBN 10:   1439854548
Pages:   450
Publication Date:   07 December 2011
Audience:   College/higher education ,  Postgraduate, Research & Scholarly
Format:   Hardback
Publisher's Status:   Active
Availability:   In Print   Availability explained
This item will be ordered in for you from one of our suppliers. Upon receipt, we will promptly dispatch it out to you. For in store availability, please contact us.

Table of Contents

Introduction Trends Data Mining and Security Technologies Data Mining for Email Worm Detection Data Mining for Malicious Code Detection Data Mining for Detecting Remote Exploits Data Mining for Botnet Detection Stream Data Mining Emerging Data Mining Tools for Cyber Security Applications Organization of This Book Next Steps Part I: DATA MINING AND SECURITY Introduction to Part I: Data Mining and Security Data Mining Techniques Introduction Overview of Data Mining Tasks and Techniques Artificial Neural Network Support Vector Machines Markov Model Association Rule Mining (ARM) Multi-class Problem 2.7.1 One-VS-One 2.7.2 One-VS-All Image Mining 2.8.1 Feature Selection 2.8.2 Automatic Image Annotation 2.8.3 Image Classification Summary References Malware Introduction Viruses Worms Trojan Horses Time and Logic Bombs Botnet Spyware Summary References Data Mining for Security Applications Overview Data Mining for Cyber Security 4.2.1 Overview 4.2.2 Cyber-terrorism, Insider Threats, and External Attacks 4.2.3 Malicious Intrusions 4.2.4 Credit Card Fraud and Identity Theft 4.2.5 Attacks on Critical Infrastructures 4.2.6 Data Mining for Cyber Security Current Research and Development Summary References Design and Implementation of Data Mining Tools Introduction Intrusion Detection Web Page Surfing Prediction Image Classification Summary and Directions References Conclusion to Part I DATA MINING FOR EMAIL WORM DETECTION Introduction to Part II Email Worm Detection Introduction Architecture Related Work Overview of Our Approach Summary References Design of the Data Mining Tool Introduction Architecture Feature Description 7.3.1 Per-Email Features 7.3.2 Per-Window Features Feature Reduction Techniques 7.4.1 Dimension Reduction 7.4.2 Two-Phase Feature Selection (TPS) 7.4.2.1 Phase I 7.4.2.2 Phase II Classification Techniques Summary References Evaluation and Results Introduction Dataset Experimental Setup Results 8.4.1 Results from Unreduced Data 8.4.2 Results from PCA-Reduced Data 8.4.3 Results from Two-Phase Selection Summary References Conclusion to Part II Part III: DATA MINING FOR DETECTING MALICIOUS EXECUTABLES Introduction to Part III Malicious Executables Introduction Architecture Related Work Hybrid Feature Retrieval (HFR) Model Summary and Directions References Design of the Data Mining Tool Introduction Feature Extraction Using n-Gram Analysis 10.2.1 Binary n-Gram Feature 10.2.2 Feature Collection 10.2.3 Feature Selection 10.2.4 Assembly n-Gram Feature 10.2.5 DLL Function Call Feature The Hybrid Feature Retrieval Model 10.3.1 Description of the Model 10.3.2 The Assembly Feature Retrieval (AFR) Algorithm 10.3.3 Feature Vector Computation and Classification Summary and Directions References Evaluation and Results Introduction Experiments Dataset Experimental Setup Results 11.5.1 Accuracy 11.5.1.1 Dataset1 11.5.1.2 Dataset2 11.5.1.3 Statistical Significance Test 11.5.1.4 DLL Call Feature 11.5.2 ROC Curves 11.5.3 False Positive and False Negative 11.5.4 Running Time 11.5.5 Training and Testing with Boosted J48 Example Run Summary and Directions References Conclusion to Part III DATA MINING FOR DETECTING REMOTE EXPLOITS Introduction to Part IV Detecting Remote Exploits Introduction Architecture Related Work Overview of Our Approach Summary and Directions References Design of the Data Mining Tool Introduction DExtor Architecture Disassembly Feature Extraction 13.4.1 Useful Instruction Count (UIC) 13.4.2 Instruction Usage Frequencies (IUF) 13.4.3 Code vs. Data Length (CDL) Combining Features and Compute Combined Feature Vector Classification Summary and Directions References Evaluation and Results Introduction Dataset Experimental Setup 14.3.1 Parameter Settings 14.2.2 Baseline Techniques Results 14.4.1 Running Time Analysis Robustness and Limitations 14.6.1 Robustness against Obfuscations 14.6.2 Limitations Summary and Directions References Conclusion to Part IV Part V: DATA MINING FOR DETECTING BOTNETS Introduction to Part V Detecting Botnets Introduction Botnet Architecture Related Work Our Approach Summary and Directions References Design of the Data Mining Tool Introduction Architecture System Setup Data Collection Bot Command Categorization Feature Extraction 16.6.1 Packet-level Features 16.6.2 Flow-level Features Log File Correlation Classification Packet Filtering Summary and Directions References Evaluation and Results Introduction 17.1.1 Baseline Techniques 17.1.2 Classifiers Performance on Different Datasets Comparison with Other Techniques Further Analysis Summary and Directions References Conclusion to Part V STREAM MINING FOR SECURITY APPLICATIONS Introduction to Part VI Stream Mining Introduction Architecture Related Work Our Approach Overview of the Novel Class Detection Algorithm Classifiers Used Security Applications Summary References Design of the Data Mining Tool Introduction Definitions Novel Class Detection 19.3.1 Saving the Inventory of Used Spaces during Training 19.3.1.1 Clustering 19.3.1.2 Storing the Cluster Summary Information 19.3.2 Outlier Detection and Filtering 19.3.2.1 Filtering 19.3.2.2 Detecting Novel Class Security Applications Summary and Directions Reference Evaluation and Results Introduction Datasets 20.2.1 Synthetic Data with Only Concept-Drift (SynC) 20.2.2 Synthetic Data with Concept-Drift and Novel Class (SynCN) 20.2.3 Real Data-KDDCup 99 Network Intrusion Detection 20.2.4 Real Data-Forest Cover (UCI Repository) Experimental Setup 20.3.1 Baseline Method Performance Study 20.4.1 Evaluation Approach 20.4.2 Results 20.4.3 Running Time Summary and Directions References Conclusion for Part VI EMERGING APPLICATIONS Introduction to Part VII Data Mining For Active Defense Introduction Related Work Architecture A Data Mining-Based Malware Detection Model 21.4.1 Our Framework 21.4.2 Feature Extraction 21.4.2.1 Binary n-Gram Feature Extraction 21.4.2.2 Feature Selection 21.4.2.3 Feature Vector Computation 21.4.3 Training 21.4.4 Testing Model-Reversing Obfuscations 21.5.1 Path Selection 21.5.2 Feature Insertion 21.5.3 Feature Removal Experiments Summary and Directions References Data Mining for Insider Threat Detection Introduction The Challenges, Related Work, and Our Approach Data Mining for Insider Threat Detection 22.3.1 Our Solution Architecture 22.3.2 Feature Extraction and Compact Representation 22.3.3 RDF Repository Architecture 22.3.4 Data Storage 22.3.4.1 File Organization 22.3.4.2 Predicate Split (PS) 22.3.4.3 Predicate Object Split (POS) 22.3.5 Answering Queries Using Hadoop MapReduce 22.3.6 Data Mining Applications Comprehensive Framework Summary and Directions References Dependable Real-Time Data Mining Introduction Issues in Real-Time Data Mining Real-Time Data Mining Techniques Parallel, Distributed, Real-Time Data Mining Dependable Data Mining Mining Data Streams Summary and Directions References Firewall Policy Analysis Introduction Related Work Firewall Concepts 24.3.1 Representation of Rules 24.3.2 Relationship between Two Rules 24.3.3 Possible Anomalies between Two Rules Anomaly Resolution Algorithms 24.4.1 Algorithms for Finding and Resolving Anomalies 24.4.1.1 Illustrative Example 24.4.2 Algorithms for Merging Rules 24.4.2.1 Illustrative Example of the Merge Algorithm Summary and Directions References Conclusion to Part VII Summary and Directions Overview Summary of This Book Directions for Data Mining Tools for Malware Detection Where Do We Go from Here? Appendix A: Data Management Systems: Developments and Trends Overview Developments in Database Systems Status, Vision, and Issues Data Management Systems Framework Building Information Systems from the Framework Relationship between the Texts Summary and Directions References Appendix B: Trustworthy Systems Overview Secure Systems B.2.1 Overview B.2.2 Access Control and Other Security Concepts B.2.3 Types of Secure Systems B.2.4 Secure Operating Systems B.2.5 Secure Database Systems B.2.6 Secure Networks B.2.7 Emerging Trends B.2.8 Impact of the Web B.2.9 Steps to Building Secure Systems Web Security Building Trusted Systems from Untrusted Components Dependable Systems B.5.1 Overview B.5.2 Trust Management B.5.3 Digital Rights Management

Reviews

Author Information

Mehedy Masud is a postdoctoral fellow at the University of Texas at Dallas (UTD), where he earned his PhD in computer science in December 2009. He has published in premier journals and conferences, including IEEE Transactions on Knowledge and Data Engineering and the IEEE Data Mining Conference. He will be appointed as a research assistant professor at UTD in Fall 2012. Masud's research projects include reactively adaptive malware, data mining for detecting malicious executables, botnet, and remote exploits, and cloud data mining. He has a patent pending on stream mining for novel class detection. Latifur Khan is an associate professor in the computer science department at the University of Texas at Dallas, where he has been teaching and conducting research since September 2000. He received his PhD and MS degrees in computer science from the University of Southern California in August 2000 and December 1996, respectively. Khan is (or has been) supported by grants from NASA, the National Science Foundation (NSF), Air Force Office of Scientific Research (AFOSR), Raytheon, NGA, IARPA, Tektronix, Nokia Research Center, Alcatel, and the SUN academic equipment grant program. In addition, Khan is the director of the state-of-the-art DML@UTD, UTD Data Mining/Database Laboratory, which is the primary center of research related to data mining, semantic web, and image/videoannotation at the University of Texas at Dallas. Khan has published more than 100 papers, including articles in several IEEE Transactions journals, the Journal of Web Semantics, and the VLDB Journal and conference proceedings such as IEEE ICDM and PKDD. He is a senior member of IEEE. Bhavani Thuraisingham joined the University of Texas at Dallas (UTD) in October 2004 as a professor of computer science and director of the Cyber Security Research Center in the Erik Jonsson School of Engineering and Computer Science and is currently the Louis Beecherl Jr. Distinguished Professor. She is an elected Fellow of three professional organizations: the IEEE (Institute for Electrical and Electronics Engineers), the AAAS (American Association for the Advancement of Science), and the BCS (British Computer Society) for her work in data security. She received the IEEE Computer Society's prestigious 1997 Technical Achievement Award for outstanding and innovative contributions to secure data management. Prior to joining UTD, Thuraisingham worked for the MITRE Corporation for 16 years, which included an IPA (Intergovernmental Personnel Act) at the National Science Foundation as Program Director for Data and Applications Security. Her work in information security and information management has resulted in more than 100 journal articles, more than 200 refereed conference papers, more than 90 keynote addresses, and 3 U.S. patents. She is the author of ten books in data management, data mining, and data security.

Tab Content 6

Author Website:  

Customer Reviews

Recent Reviews

No review item found!

Add your own review!

Countries Available

All regions
Latest Reading Guide

Father's Day Reading Guide

Shopping Cart
Your cart is empty
Shopping cart
Mailing List