8.15 EMULATOR vs REAL PHONE: Android Malware Detection Using Machine Learning
The authors present an investigation of machine learning based malware detection using dynamic analysis on real devices.
What are motivations for this work
The rapid increase in malware numbers targeting Android devices has highlighted the need for efficient detection mechanisms to detect zero-day malware.
anti-emulator techniques
Sophisticated Android malware employ detection avoidance techniques in order to hide their malicious activities from analysis tools. These include a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator.
phone based dynamic analysis and feature extraction
Since our aim is to perform experiments to compare emulator based detection with device based detection we need to extract features for the supervised learning fromboth environments. For the emulator based learning, we utilized the dynamic analysis framework.
- emulator based: DynaLog provides the ability to instrument each application with the necessary API calls to be monitored, logged and extracted from the emulator during the run-time analysis.
- device based: extended with a python-based tool
- push a list of contacts to the device SD card and then import them to populate the phone’s contact list.
- Discover and uninstall all third-party applications prior to installing the app under analysis.
- Check whether the phone is in airplane mode or not.
- Check the battery level of the phone.
- Outgoing call dialling using adb shell.
- Outgoing sms messages using adb shell.
- Populate the phone SD card with other assets.
Machine learning classifiers
The features were divided into file different sets to compare the performance using machine learning algorithms.
What is the work’s evaluation of the proposed solution
Dataset
The following algorithms were used in the experiments:
- Support Vector Machine (SVM-linear)
- Naive Bayes (NB)
- Multilayer Perceptron (MLP)
- Partial Decision Trees (PART)
- Random Forest (RF)
- J48 Decision Tree.
Metrics
Five metrics were used for the performance emulation of the detection approaches.
- true positive rate (TPR)
- true negative ratio (TNR)
- false positive ratio (FPR)
- false negative ratio (FNR)
- weighted average F-measure.
Experiment 1: Emulator vs Device analysis and feature extraction
This shows that for more efficient analysis the phone is definitely a better environment as far more apps crash when being analysed on the emulator.
Thus we conclude that as an in- centive to reduce the impact of malware anti-emulation and environmental shortcomings of emulators which affect analysis efficiency, it is important to develop more effective ma- chine learning device based detection solutions.
Countermeasures against anti-emulator are becoming increasingly important in Android malware detection.
What are the contributions
- Presented an investigation of machine learning based malware detection using dynamic analysis on real Android devices.
- Implemented a tool to automatically extract dynamic features from Android phones.
- Through several experiments we performed a comparative analysis of emulator based vs. device based detection by means of several machine learning algorithms.
What questions are you left with
- How to make emulator environment more closer to real environment?
- How to make more powerful dynamic analysis tools that can against anti-emulation techniques?