Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Thesis Proposal Slides

Khalid Alharbi
November 30, 2015

Thesis Proposal Slides

Presentation slides from my PhD proposal defense

Khalid Alharbi

November 30, 2015
Tweet

Other Decks in Research

Transcript

  1. A Deep and Longitudinal Approach to Mining Mobile Applications Khalid

    Alharbi Sikuli Lab University 
 of Colorado
 Boulder Thesis Proposal November 30th, 2015
  2. Mobile apps have become a valuable data source to mine

    and extract insights from “ ”
  3. Mining Store Listing Details Data - Mining reviews to perform

    sentiment analysis and summarize users’ feedback. - Mining specific listing information such as description, maturity ratings, what’s new, category, developer, etc.
  4. Mining User Interface Data - Mining User Interface Layout files

    to find design examples and compute UI complexity. - Mining UI text data to identify possible stealthy behavior.
  5. Mining Code Data - Applying static analysis to study permission

    and API usages.
 - Detecting patterns similar to malicious applications. - Detecting privacy violations.
  6. UI Design Analysis public class MainActivity extends AppCompatActivity { }

    <android.support.v7.widget.Toolbar android:id="@+id/my_toolbar" android:theme=“@style/ThemeOverlay.AppCompat.ActionBar" /> <application android:theme=“@style/Theme.AppCompat.Light.NoActionBar"/> AndroidManifest.xml MainActivity.java main_activity.xml Code UI Manifest
  7. Deep Approach Deep and structural indexing of apps across multiple

    levels (listing, UI, code, libraries, backend providers) “ ”
  8. Thesis Statment The goal of this work is to enable

    and advance mining mobile apps at deep and longitudinal levels to gain insights into various aspects of mobile apps at large-scale “ ”
  9. Research Questions •RQ1: What are the benefits of the deep

    and longitudinal approach to mining mobile apps? •RQ2: How can we enable deep searching and mining of mobile apps over time?
  10. RQ1.1: What could a deep and longitudinal approach achieve in

    UI design mining ? Collect, Decompile, Extract, Stats, and Diff: Mining Design Pattern Changes in Android Apps Khalid Alharbi and Tom Yeh, MobileHCI 2015, Honorable Mention Award
  11. Data-driven Approach •Collect apps and their subsequent versions •Decompile apps

    to their nearly original form •Extract a comprehensive set of features •Stats compute statistics on extracted features •Diff compute the difference between app versions
  12. Features Extraction Listing Details Title, description, reviews, category, price, date

    published, ratings count, rating, install size, downloads count, permissions, what’s new. Appearance Layout directories, layout files, view group containers, view elements, relationships, drawable resources, UI text resources. Behavior app framework invocations, manifest (AndroidManifest.xml), third-party libraries.
  13. Stats •Compute descriptive statistics •What is the percentage of apps

    that used a given design pattern? •What is the percentage of apps that switched into an alternative design pattern? •What is the percentage of apps that maintained the use of a given design pattern?
  14. Application •Navigation design patterns •Tab Layout Fragments •Horizontal Paging Action

    Bar •Up Navigation Navigation Drawers •Custom UI Components •Homescreen widgets
  15. Tabs with TabHost •Introduced in the initial release of the

    Android UI framework. •Deprecated in API level 11 in favor of new navigation patterns, such as the Fragment and the Action Bar patterns.
  16. Tabs with TabHost •3,809 (15.6%) apps used TabHost •666 (17.5%)

    of them used it for the first time •Migration rate is very slow •413 (10.8%) apps stopped using it and switched to other types of navigations •New apps are using it
  17. Navigation Drawer •1,183 (4.8%) apps used Navigation Drawer •771 (65.2%)

    of them used it for the first time •Adoption rate is very slow •37 (3.1%) apps stopped using it and switched to other types of navigations
  18. Findings: Some Highlights •Some apps would switch to a design

    pattern even after it was deprecated. •The adoption rate of newly introduced design patterns is relatively low. •Some apps would update their listing details to reflect changes in design patterns. •Some design patterns have slow adoption rate but higher continuity of use over multiple releases.
  19. RQ2: How can we enable deep searching and mining of

    mobile apps over time? RQ1: What are the benefits of the deep and longitudinal approach to mining mobile apps?
  20. A Deep Search Engine 
 Sieveable: Deep Android Apps Search

    Khalid Alharbi, Jackson Chen, and Tom Yeh, VLDB 2016 (in submission)
  21. Challenges • Dealing with heterogeneous data • Store Listing Details:

    document-oriented • UI: hierarchical tree structure • Code: text data • Creating structural indexes • Designing an effective and generalizable search engine.
  22. Data Collection • Web and marketplace crawlers • Collected over

    400,000 apps • Multiple versions are collected • Decompile downloaded apps • Extract listing details, UI, manifest, and code features • Total dataset size is over10 TB
  23. Features Indexing • Listing details • Single-field and text indexes

    • UI • Structural index that keeps track of all DOM elements’ relationships. • Manifest •Index DOM elements and their attributes • Code •Full-text indexes
  24. Indexing UI data • Tag Index • Holds tag names

    and their attributes • • Structural Index •A suffix tree based index that stores the XML tree in a suffix array format RelativeLayout->LinearLayout-> ImageButton EditText(android: layout_width=“fill_parent”)
  25. Query Language Specification • SQL-like declarative query syntax MATCH app

    WHERE <developer>Google Inc.</developer> <LinearLayout> <Button/> </LinearLayout> <uses-permission android:name= “android.permission.SEND_SMS"/> <code class=“android.hardware.Camera" method="takePicture"/> RETURN app
  26. Query Language Specification • Query Parser MATCH app WHERE <developer>Google

    Inc.</developer> <LinearLayout> <Button/> </LinearLayout> <uses-permission android:name= “android.permission.SEND_SMS"/> <code class=“android.hardware.Camera" method="takePicture"/> RETURN app Listing UI Manifest Code
  27. Query Language Specification • Query Executor Listing Details Code Manifest

    Index Manifest DOM UI Index UI DOM MATCH app WHERE <developer>Google Inc.</developer> <LinearLayout> <Button/> </LinearLayout> <uses-permission android:name= “android.permission.SEND_SMS"/> <code class=“android.hardware.Camera" method="takePicture"/> RETURN app 125K apps 249 apps 66 apps 21 apps 21 apps 15 apps 15 apps
  28. UI Search Queries MATCH app WHERE <store-category>(*)</store-category> <PreferenceScreen> <ListPreference/> </PreferenceScreen>

    <code class="android.content.Context" method="getSharedPreferences"/> RETURN app, $1 Find the store category of apps with Settings/Preferences Screens
  29. Program Analysis Queries MATCH app WHERE <code class="android.hardware.Camera" method="open"/> <uses-permission

    android:name="android.permission.CAMERA"/> RETURN app MATCH app WHERE <code class="android.app.Service" method="onStartCommand"/> <whats-new>bug fixes</whats-new> RETURN app • Overprivilege Analysis • Bug Fixes Analysis
  30. Security Queries MATCH app WHERE <code method="addJavascriptInterface"/> <uses-sdk android:targetSdkVersion="12"/> RETURN

    app MATCH app WHERE <uses-permission android:name="android.permission.*" __min="11" /> <uses-permission android:name="*_SMS"/> RETURN app • Potential software vulnerability • Permissions
  31. Approach Evaluation •[Deep] As the number of levels increases, the

    accuracy of the analysis increases significantly. •[Longitudinal] As the number of versions increases, the more insights we obtain about mobile apps.
  32. Evaluation: Deep Approach •Comparison against single view approaches. •Find apps

    with material design lists and cards. •Randomly select 1000 apps •Inspect screenshots and description in listing •Inspect their UI layout files •Inspect their bytecode files •As we go deeper in the analysis, both precision and recall increase significantly.
  33. Evaluation: Longitudinal Approach •Evolution and adoption rate experiments •Example: Find

    the time period that has seen a spike in adopting material design •Collect ground truth: use web search engines to collect 100 number of apps that have switched to material design. •Use Sieveable to find the release date of the first adopted version and compare the result with the ground truth.
  34. Contributions • A novel approach that enables mining mobile apps

    at multiple levels over time manifested in a search engine called Sieveable • Demonstrated the utility of the approach by conducting diverse types of analyses •The proposed approach has been implemented, deployed, and people can actually use it
  35. TimeLine Task Date Evaluation December 2015 - January 2016 Writing

    and completing dissertation February 2016 Defending the dissertation March 2016
  36. A Deep and Longitudinal Approach to Mining Mobile Applications Khalid

    Alharbi Tom Yeh, Jackson Chen, Sanghee Kim, and everyone at the Sikuli lab. King Abdulaziz University. everyone else who is here today. & Thank you & http://sieveable.io These slides are available at:
 https://speakerdeck.com/kalharbi