Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When BigData hits "BigCode"

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Julian Viereck Julian Viereck
February 24, 2015

When BigData hits "BigCode"

Explaining the basics behind http://jsnice.org/

Avatar for Julian Viereck

Julian Viereck

February 24, 2015
Tweet

More Decks by Julian Viereck

Other Decks in Science

Transcript

  1. Hi • My name is Julian Viereck • JavaScript developer

    since 2008 • Contribute to OpenSource (e.g. Firefox) • Master CS student at ETH Zurich • Machine Learning, Software Analysis & more
  2. Today • Present research work from ETH Zurich • ‘Predicting

    Program Properties from “Big Code”’ by 'Veselin Raychev, Martin Vechev, Andreas Krause,
 POPL’15 • http://www.srl.inf.ethz.ch/papers/jsnice15.pdf • Bridge Software Analysis and Machine Learning
  3. Massive Code Available • Maybe can learn from existing programs?

    • Number of available code is growing Graphs from: http://githut.info/
  4. The DARPA “big code” initiative, […] , seeks to leverage

    software analysis and big data analytics to improve the way software is built, debugged and verified. Big Code Initiative http://www.datanami.com/2014/05/05/darpa-launches-big-code-initiative/
  5. Aliens want to learn JS • Assume you are an

    alien observing earth • You want to learn about the top programming language on earth • Of course that’s JavaScript! • Your task: How is ‘writeFileSync’ used? • Context, Argument names, Argument types • Talking to humans complicated, but <3 analysing data!
  6. Use Machine Learning • Need to formalise problem precise (using

    math) • Pattern recognition art in Machine Learning • How to represent program “elements” • Idea: • Model program as dependency graph • Find most likely assignments
  7. Dependency Graph • What is known, what is unknown? Known

    Properties: 0 [] length … Unkown Properties: ? ? e t ? n ? r ? i
  8. L=_.R L+=R L<R Dependency Graph • What is known, what

    is unknown? • How are entries related? length ? t ? r ? i Feature: (a, b, rel) Related to AST
 but also other
 connections!
  9. Find Best Assignment • Given scores for each relation, find

    global optimum • Scores learned from existing code Not local optima but yields better global score 1 3 2 1 3 2
  10. Practical Issues • Large set of feature combinations: • #

    indexed js files: 324’501 • ~ 7’000’000 features for names • ~ 70’000 features for names ➡ 10 h / 1 h to learn on 32 core Xeon machine • But have to do this only once, reuse results
  11. Practical Issues • Finding global optimal takes too long •

    Search greedy for optima locally • Only look at features that have high score • With these adjustments • Prediction works quite fast
  12. Where to use this? • Code Editor: • Code completion

    • Predict types based on names • Type Checking: • Provide hints for type inference • Optimal for cloud based service
  13. More Background • About JSNice: • http://www.srl.inf.ethz.ch/jsnice.php • Programming Tools

    based on Big Data: • http://www.srl.inf.ethz.ch/spas.php https://www.youtube.com/watch? v=-_CvQeXbVGg