Slide 1

Slide 1 text

ANDROID 重製 APP 偵測系統 shunyi @SITCON2015

Slide 2

Slide 2 text

ABOUT ME 鄭順⼀一 (Shun-Yi Jheng) [email protected] http://m157q.github.io A Python, Free Software, Security and Arch GNU/ Linux Lover. 魯蛇⼀一條,⽬目前正在三修 Compiler 中。︒

Slide 3

Slide 3 text

ABOUT MY PARTNER 江泓樂 (Kenny Chiang) http://kenny5312012.blogspot.tw 溫拿卷哥,⽬目前出國交換到瑞⼠士的蘇黎世聯邦理 ⼯工學院中。︒

Slide 4

Slide 4 text

決定這場 TALK 的難度 容我學⼀一下 Halt and Catch Fire

Slide 5

Slide 5 text

沒聽過或不太懂的名詞請將⼿手放下 偷偷推薦資⼯工阿宅觀看這影集

Slide 6

Slide 6 text

KEYWORDS App, Android, Google Play Perl, Python, Java, JavaScript .apk, .dex, .smali, .json Scrapy, NetworkX, Node.js, D3.js Data Dependence Graph, Program Slicing Fuzzy Hashing, Obfuscation, Subgraph Isomorphism

Slide 7

Slide 7 text

OUTLINE Motivation & Goal Related Projects System Architecture Related Open Source Tools Conclusion

Slide 8

Slide 8 text

動機 Why?

Slide 9

Slide 9 text

MOTIVATION 始於 2013 年 7 ⽉月的⼤大學部專題計劃。︒ 原本是要拿 Lab 的各種 Virus Sample 檢測比對市 ⾯面上的 Android App 並將其做分類分群,但已有 ⼈人先做。︒ 後來邊看相關論⽂文邊討論的過程中,某件當時很 紅的事件成為了契機。︒

Slide 10

Slide 10 text

MOTIVATION

Slide 11

Slide 11 text

於是進⼀一步思考...

Slide 12

Slide 12 text

MOTIVATION 能不能透過程式比對去得到兩個 App 之間的相似 度? 類似 Stanford 拿來抓學⽣生程式作業抄襲的 Moss 系統? 進⼀一步查閱相關資料後,發現 Android App 這部 分比較沒⼈人在做。︒

Slide 13

Slide 13 text

研讀相關系統之論⽂文 Related Projects

Slide 14

Slide 14 text

RELATED PROJECTS AnDarwin SCanDroid DroidRanger DroidMOSS

Slide 15

Slide 15 text

ANDARWIN A scalable approach to detecting similar Android apps based on semantic information. AnDarwin: Scalable Detection of Semantically Similar Android Applications

Slide 16

Slide 16 text

SCANDROID Static analyzing data flow and permissions. SCanDroid: Automated Security Certification of Android Applications

Slide 17

Slide 17 text

DROIDRANGER A malware detection system to do pairwise compare the app with malware samples. Hey, You, Get Off of My Market: Detecting Malicious Apps in Official and Alternative Android Markets

Slide 18

Slide 18 text

DROIDMOSS Analyze repackaged App between official and third-party marketplaces with fuzzy hashing technique. Detecting Repackaged Smartphone Applications in Third-Party Android Marketplace

Slide 19

Slide 19 text

FUSSY HASHING aka Context Triggered Piecewise Hashing allows examiners to find documents that are similar but not quite identical. The Digital Standard: Why Fuzzy Hashing is Really Cool http://jessekornblum.com/presentations/htcia06.pdf

Slide 20

Slide 20 text

列出⽬目標

Slide 21

Slide 21 text

GOAL 從 Google Play 和第三⽅方 Market 下載 App 分析比較 App 之間的相似度 分析結果視覺化 讓使⽤用者可⾃自⾏行上傳並提供分析結果的網站

Slide 22

Slide 22 text

根據⽬目標劃分系統架構 System Architecture

Slide 23

Slide 23 text

根據四個⽬目標將系統分為 Crawler Analyzer Visualizer Website

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

⽤用到了哪些 OPEN SOURCE TOOLS? Related Open Source Tools

Slide 26

Slide 26 text

THE TRUE HACKER 為了呼應今年主題

Slide 27

Slide 27 text

HOW TO BECOME A HACKER Eric Steven Raymond http://www.catb.org/esr/faqs/hacker-howto.html

Slide 28

Slide 28 text

NO PROBLEM SHOULD EVER HAVE TO BE SOLVED TWICE. Creative brains are a valuable, limited resource. They shouldn't be wasted on re-inventing the wheel when there are so many fascinating new problems waiting out there.

Slide 29

Slide 29 text

所以我開始了⼀一段旅程 尋找現成的輪⼦子(?)、︑評估比較、︑將其整合

Slide 30

Slide 30 text

CRAWLER Google Play - Akdeniz/google-play-crawler Unofficial API, written in Java. Third Party - mssun/android-apps-crawler Use Scrapy (A Python Framework for webcrawler) Scheduling Use Perl Script + crontab

Slide 31

Slide 31 text

ANALYZER Data Dependency Use SAAF (Static Android Analysis Framework), written in Java. Supports Program Slicing on smali code. Subgraph Isomorphism Use NetworkX - A Python Lib for graph analyzing

Slide 32

Slide 32 text

DATA DEPENDENCY GRAPH Directed graph representing dependencies of several objects towards each other. An edge from a to b: iff a must be evaluated before b. Dependency graph - Wikipedia, the free encyclopedia Data dependency - Wikipedia, the free encyclopedia

Slide 33

Slide 33 text

DATA DEPENDENCY GRAPH http://knuth.uprrp.edu/blog/wp-content/uploads/ 2011/12/CCOM3033-DataDependency.pdf

Slide 34

Slide 34 text

PROGRAM SLICING Computation of the set of programs statements, the program slice, that may affect the values at some point of interest, referred to as a slicing criterion.

Slide 35

Slide 35 text

PROGRAM SLICING

Slide 36

Slide 36 text

DATA V.S FLOW Data Dependence 追蹤分析變數的儲存值 Flow Dependence 追蹤程式執⾏行的流程

Slide 37

Slide 37 text

VISUALIZER Use D3.js A JavaScript Lib for data manipulating & visualization 可使⽤用 json 直接 render 出視覺化圖形 有非常多不同的類型可以選擇

Slide 38

Slide 38 text

WEBSITE Use Node.js 讓使⽤用者能夠上傳 apk 檔至系統 和 D3.js 相容性較⾼高 想藉機會學 JavaScript

Slide 39

Slide 39 text

CRAWLER 從 Google Play 下載 App 從 Third Party Market 下載 App 定時更新並記錄 App 資訊

Slide 40

Slide 40 text

ANALYZER 將每個下載下來的 apk 檔解開得到 dex code 利⽤用 smali 將 dex code 轉為 smali code 再透過 SAAF 得到每個 apk 檔的 Data Dependency Graph 將 Data Dependency Graph 以 json 格式輸出

Slide 41

Slide 41 text

.APK, .DEX, .SMALI .apk: Android Application Package .dex: Dalvik EXecutable .smali Smali (assembler in Icelandic) Backsmali (disassembler in Icelandic)

Slide 42

Slide 42 text

HELLO WORLD IN SMALI CODE https://code.google.com/p/smali/

Slide 43

Slide 43 text

ANALYZER 將 json 形式的 Data Dependency Graph 載入 ⾃自⾏行撰寫 Python Script,利⽤用 NetworkX 進⾏行⼦子 圖共構 (Subgraph Isomorphism) 的比對,計算 其相似度 將分析完後的圖形以 json 形式輸出給 Visualizer

Slide 44

Slide 44 text

SUBGRAPH ISOMORPHISM two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H. Subgraph isomorphism problem - Wikipedia, the free encyclopedia Graph isomorphism - Wikipedia, the free encyclopedia

Slide 45

Slide 45 text

GRAPH ISOMORPHISM

Slide 46

Slide 46 text

VISUALIZER 利⽤用 D3.js 在網⾴頁上呈現 Data Dependency Graph 視覺化的結果

Slide 47

Slide 47 text

WEBSITE ⽤用 Node.js 架站,讓使⽤用者上傳 apk 檔進⾏行分析

Slide 48

Slide 48 text

LIVE DEMO? 時間應該不太夠了吧? 其實還剩網站最後⼀一⼩小部分沒開發完 最近在寫計劃結報 SAAF 是 GNU GPL v3 之後會將程式碼公佈 敬請關注

Slide 49

Slide 49 text

CONCLUSION 推薦報名國科會⼤大專專題研究計劃 (有$$$) 專題⽅方向從⾃自⼰己⽣生活需求中找 理論(看論⽂文)與實作(寫程式)必須並重 ⼀一定要先 Survey 相關領域 Don’t reinvent the wheel.

Slide 50

Slide 50 text

HACK EVERYTHING http://www.catb.org/esr/faqs/hacker-howto.html https://github.com/Akdeniz/google-play-crawler https://github.com/mssun/android-apps-crawler https://github.com/scrapy/scrapy https://code.google.com/p/saaf/ https://code.google.com/p/smali/ https://github.com/networkx/networkx https://github.com/mbostock/d3 https://github.com/joyent/node