Machine Learning for Program Analysis

VIRTUAL 32 CPE HOURS TRAINING: FEBRUARY 2023

Hahna Kane Latonick

Abstract

This course features a practical hands-on approach to automated program analysis using machine learning. Given the increasing pervasiveness of IoT devices and malware, there is a great need to perform automated reverse engineering at scale, especially since reverse engineering software and firmware can often be a manual, labor-intensive, and time-intensive process. This class is perfectly suited for students who are new to machine learning and want to leverage it to automate their program analysis and reverse engineering efforts.

This class kicks off with performing advanced program analysis to automatically identify shared code relationships between applications using different binary features, compute code sharing similarity over a data set to determine binary groupings, and then determine a new binary’s similarity to previously seen samples based on code sharing patterns. We will also cover intermediate representations of binaries and how they can be used for advanced program analysis.

Next, we will introduce machine learning concepts and their applications to automated reverse engineering. We will first use unsupervised machine learning algorithms to find data patterns and features which can be useful for categorization. Then we will develop supervised machine learning models to classify binaries and make certain predictions about them. Lastly, we will apply deep learning to automate program analysis by building and evaluating neural networks. Throughout the class, labs will be conducted in a virtual environment. Students will leave the course with the necessary hands-on experience, knowledge, and confidence to conduct automated program analysis at scale using machine learning.

Applications covered in the class include, but are not limited to:

  • Binary Analysis
  • Malware Analysis
  • Firmware Analysis
  • Network/IoT Analysis
  • Mobile Security Analysis
  • Security Research / Vulnerability Discovery

Key Learning Objectives

  • Performing Shared Code Analysis
  • Leveraging intermediate representations for advanced program analysis
  • Introduction to Machine Learning
  • Exploring Unsupervised ML algorithms
  • Developing Supervised ML models
  • Building Neural Networks
  • Evaluating and measuring the effectiveness of ML systems

Who Should Attend

  • Reverse engineers, security researchers, and analysts with little to no experience with machine learning
  • Analysts, security researchers, and reverse engineers who want to automate and scale their program analysis and reverse engineering process

Agenda

Day 1:

  • Introduction to advanced program analysis
  • Identifying and extracting program features
  • EXERCISE: Similarities Lab
  • Leveraging N-Grams for program analysis
  • EXERCISE: N-Grams Lab
  • Performing agnostic program analysis
  • EXERCISE: Architecture and Compiler Agnostic Analysis Lab
  • Introduction to intermediate representations
  • EXERCISE – IR Lab

Day 2:

  • Introduction to Machine Learning
  • Evaluating ML systems
  • Unsupervised ML algorithm: K-Means Clustering
  • EXERCISE: K-Means Lab
  • Unsupervised ML algorithm: Agglomerative Hierarchical Clustering
  • EXERCISE: Agglomerative Analysis Lab
  • Unsupervised ML algorithm: Principal Component Analysis
  • EXERCISE: PCA Lab

Day 3:

  • Introduction to Supervised Machine Learning
  • Supervised ML algorithm: Logistic Regression
  • EXERCISE: Logistic Regression Lab
  • Supervised ML algorithm: Decision Tree
  • EXERCISE: Decision Tree Lab
  • Supervised ML algorithm: Random Forest
  • EXERCISE: Random Forest Lab
  • Supervised ML algorithm: K Nearest Neighbors
  • EXERCISE: KNN Lab
  • Supervised ML algorithm: Support Vector Machines
  • EXERCISE: SVM Lab

Day 4:

  • Introduction to Neural Networks
  • Building Neural Networks for Program Analysis
  • EXERCISE: Neural Networks Development Lab
  • Evaluating Neural Networks
  • EXERCISE: Neural Networks Performance Lab

Pre-requisites

  • Knowledge of Python 3 programming
  • Knowledge of computer architecture concepts
  • Knowledge of an assembly language (e.g., x86/x64, ARM, etc.)
  • Familiarity with navigating Linux environments and command line knowledge

Hardware Requirements

  • A working laptop or desktop (no Netbooks, no Tablets, no iPads)
  • Intel Core i3 (equivalent or superior) required
  • 8GB RAM required, at a minimum
  • 10 GB free hard disk space, at a minimum

Software Requirements

The following software needs to be installed on each student laptop prior to the workshop:

  • Linux / Windows / Mac OS X desktop operating systems
  • VMware Workstation or Fusion. The free 30-day trial is sufficient and can be downloaded here: https://www.vmware.com/try-vmware.html
  • Administrator / root access MANDATORY

Students will be provided with:

Students will be provided with access to course slides, sample code, and lab exercises which attendees can keep to continue their learning and practicing after the training ends.