Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When BigData hits "BigCode"

Julian Viereck
February 24, 2015

When BigData hits "BigCode"

Explaining the basics behind http://jsnice.org/

Julian Viereck

February 24, 2015
Tweet

More Decks by Julian Viereck

Other Decks in Science

Transcript

  1. Hi • My name is Julian Viereck • JavaScript developer

    since 2008 • Contribute to OpenSource (e.g. Firefox) • Master CS student at ETH Zurich • Machine Learning, Software Analysis & more
  2. Today • Present research work from ETH Zurich • ‘Predicting

    Program Properties from “Big Code”’ by 'Veselin Raychev, Martin Vechev, Andreas Krause,
 POPL’15 • http://www.srl.inf.ethz.ch/papers/jsnice15.pdf • Bridge Software Analysis and Machine Learning
  3. Massive Code Available • Maybe can learn from existing programs?

    • Number of available code is growing Graphs from: http://githut.info/
  4. The DARPA “big code” initiative, […] , seeks to leverage

    software analysis and big data analytics to improve the way software is built, debugged and verified. Big Code Initiative http://www.datanami.com/2014/05/05/darpa-launches-big-code-initiative/
  5. Aliens want to learn JS • Assume you are an

    alien observing earth • You want to learn about the top programming language on earth • Of course that’s JavaScript! • Your task: How is ‘writeFileSync’ used? • Context, Argument names, Argument types • Talking to humans complicated, but <3 analysing data!
  6. Use Machine Learning • Need to formalise problem precise (using

    math) • Pattern recognition art in Machine Learning • How to represent program “elements” • Idea: • Model program as dependency graph • Find most likely assignments
  7. Dependency Graph • What is known, what is unknown? Known

    Properties: 0 [] length … Unkown Properties: ? ? e t ? n ? r ? i
  8. L=_.R L+=R L<R Dependency Graph • What is known, what

    is unknown? • How are entries related? length ? t ? r ? i Feature: (a, b, rel) Related to AST
 but also other
 connections!
  9. Find Best Assignment • Given scores for each relation, find

    global optimum • Scores learned from existing code Not local optima but yields better global score 1 3 2 1 3 2
  10. Practical Issues • Large set of feature combinations: • #

    indexed js files: 324’501 • ~ 7’000’000 features for names • ~ 70’000 features for names ➡ 10 h / 1 h to learn on 32 core Xeon machine • But have to do this only once, reuse results
  11. Practical Issues • Finding global optimal takes too long •

    Search greedy for optima locally • Only look at features that have high score • With these adjustments • Prediction works quite fast
  12. Where to use this? • Code Editor: • Code completion

    • Predict types based on names • Type Checking: • Provide hints for type inference • Optimal for cloud based service
  13. More Background • About JSNice: • http://www.srl.inf.ethz.ch/jsnice.php • Programming Tools

    based on Big Data: • http://www.srl.inf.ethz.ch/spas.php https://www.youtube.com/watch? v=-_CvQeXbVGg