"Быстрое обнаружение вредоносного ПО для android с...

Post on 14-Jan-2015

1.987 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

В докладе речь пойдёт о применении алгоритмов машинного обучения для обнаружения вредоносных приложений для Android. Я расскажу, как на базе Матрикснета в Яндексе был спроектирован высокопроизводительный инструмент для решения этой задачи. А также продемонстрирую, в каких случаях аналитические методы выявления вредоносного ПО помогают блокировать множество простых образцов вирусного кода. Затем мы поговорим о том, как можно усовершенствовать такие методы для обнаружения более хитроумных вредных программ.

TRANSCRIPT

1

2

Fast detection of Android malware

Yury Leonychev

3

Introduction

4

Android application

APK

Manifest

(AndroidManifest.xml)

Code (Classes.dex and

native)

Meta information

(META-INF)

Resources (files and

Resources.arsc)

5

Brief list of tools for APK analysis

! Androguard (ultimate tool by @adesnos and others) – used by VirusTotal, APKInspector, etc.

! SCanDroid (Adam P. Fuchs, Avik Chaudhuri, and Jeffrey S. Foster)

! TaintDroid (guys from Intel, Penn State University, Duke University)

! DroidBox (dynamic analysis by Lantz Patric) – used by ApkScan

6

Is this all? Really?

!  http://www.apk-analyzer.net !  http://anubis.iseclab.org !  http://apkscan.nviso.be

7

Our task is more complex

Malware detector

8

Methods of malware detection

Static analysis !  Advantages

–  APK has predictable content. Application behavior can be learned by simply reading the file

–  Checks are safe !  Limitations

–  Can be ineffective for sophisticated malware and obfuscation techniques –  We cannot really tell as we don't execute app

9

Methods of malware detection

Dynamic analysis !  Advantages

–  Clear results and interpretation

–  Open source solutions available

!  Limitations

–  Not fast (enough)

–  Can be detected and bypassed

–  Big ecosystem requires big infrastructure

10

Methods of malware detection

Signature analysis !  Advantages

–  Effective for known malware –  Commercial solutions available !  Limitations

–  Signature databases requires regular (and frequent) updates –  Not effective for new malware –  Do you have a team of virus analytics?

11

Methods of malware detection

Seems like the most efficient way is hybrid solution

12

MatrixNet

What is The Matrix?

13

Why can we use machine learning?

Abstract task description: !  We have a set of objects (APK-files). We should divide this set into two

subsets (malware and normal)

!  For every element in main set we can count predictable amount of features

!  Subsets – only result of simple classification task, so we can try to choose effective features

14

What is the MatrixNet?

MatrixNet is an implementation of gradient boosted decision trees algorithm MatrixNet is a bit different from standard: !  Using Oblivious Trees

!  Accounting for sample count in each leaf

15

Why MatrixNet is powerful?

!  This is machine learning algorithm for classification task

!  A key feature of this method is it’s resistance to overfitting

16

MatrixNet post learning optimization

17

MatrixNet post learning optimization

Copyright © 2013 by Sidney Harris.

18

How it works?

Offline learning process: !  Choosing features

!  Choosing samples

!  Manual classification (malware or not)

!  Learning on combined set of apps

!  Calculating mistakes

19

Features

What kind of features to use: !  Permissions

!  URI in strings and other resources

!  Adware library usage

!  Obfuscation methods

!  …

20

Samples and classification

Malware applications: ! VirusTotal feed !  Samples from malicious sites

Normal applications: !  Manual testing !  Trusted developers !  Yandex applications

21

Formula

Features weight

Features cost

Learning

Normal

Malware

MatrixNet Features  

22

Measuring of mistakes

Formula 1

Features cost 1

Formula N

Features cost N

Normal

Malware

Formula with cool confusion matrix and low cost

23

Analyzer architecture

Fine! I'll go build my own casino, with blackjack and big data

24

Main parts

Parsers Analyzers

Oracle Report

25

Parsers

In depth APK

ManifestParser ResourceParser MetaInfoParser ClassesParser

Analyzers

PermissionAnalyzer PackageAnalyzer URLAnalyzer ReflectionAnalyzer

Reports

XHTMLReporter JSONReporter

Oracle

MatrixNet

26

ManifestParser

Avoid some obfuscation methods: ! HEUR:Backdoor.AndroidOS.Obad.a

27

<?xml version="1.0" encoding="utf-8"?> <manifest ="singleTop" android:versionCode="2" ="2.0" android:installLocation="internalOnly" package="com.android.system.admin" xmlns:android="http://schemas.android.com/apk/res/android"> <uses-permission ="android.permission.READ_LOGS" /> <uses-permission ="android.permission.WAKE_LOCK" /> … <uses-permission ="android.permission.RECEIVE_SMS" /> <uses-permission ="android.permission.SEND_SMS" /> <uses-permission ="android.permission.CALL_PHONE" />

ManifestParser

28

ClassesParser

!  Parser for DEX files

!  Internal DEX disassembler

!  Callgraph builder

!  Embeds “real” functions/variables names into disassembly listing

!  Builds a list of used procedures and functions

29

ClassesParser Disassembler https://github.com/tracer0tong/de

Example: ./de.py test1.dex.dat

[[0, 'sget-object v0, {type} [{class}].{field} // field@2225'],

[2, 'invoke-virtual v0 @13970 // {class}->{method}'],

[5, 'move-result-object v0'],

[6, 'check-cast v0, [{type_name}] // type@0958'],

[8, 'return-object v0']]

30

ReflectionAnalyzer

java.lang.reflect.* !  Classes: Field, Method, etc. !  Functions: getClass(), getDeclaredField(), etc.

31

ReflectionAnalyzer

Output: !  Report:

There is some reflections usage: 1@android.app.Activity->getContentResolver calls: 598@java.lang.Class->forName 2@android.app.Activity->onActivityResult calls: 598@java.lang.Class->forName !  Amount of reflection calls is a feature.

32

Service architecture

Nginx  

Gunicorn  

Flask  

Celery  

MongoDB  

Nginx  

Gunicorn  

Flask  

Celery  

MongoDB  

33

Case study

34

Let's try it on...

Yandex.Store application feed: !  More than 50K Android applications

!  More than 200 new/updated apps per week

!  Open for developers (no strict manual verification)

35

Perfomance. Check timing

~2 ms

~0,25 s

~4,5 min

36

Performance. Amount of checks

!  More than 16.000 applications checked in 1 hour on 1 cluster node

37

Confusion matrix

Meaning

Malware (Score > 0) Normal (Score < 0)

Fact Malware 485 (97%) 15 (3%)

Normal 25 (5%) 475 (95%)

38

(Un)predictable results

!  Applications with malicious adware library AirPush classified as malware

!  But we have no special features for adware in first version

39

Conclusion

It’s alive… alive!

40

It works!

!  Analytic methods work fine for detection Android mobile malware

!  Machine learning is not a “rocket science” but cool and effective instrument

!  Open API coming soon.

41

Thanks for attention

42

Yury Leonychev Application Security Engineer

yleonychev@yandex-team.ru

!   tracer0tong © Yandex LLC 2013

top related