Upgrade to Pro — share decks privately, control downloads, hide ads and more …


February 11, 2016


v5 of demo


February 11, 2016

Other Decks in Science


  1. Need a way to check professional content Want to be

    a Data Scientist? How to write a good resume?
  2. Pipeline and Algorithm Data Collection 2,816 DS SE Con SM

    Natural Language Processing  Tokenization  Stemming  Stopwords  POS Tagging  Nouns only  TF-IDF 252 tokens
  3. 252 tokens Pipeline and Algorithm Data Collection 2,816 DS SE

    Con SM Natural Language Processing  Tokenization  Stemming  Stopwords  POS Tagging  Nouns only  TF-IDF Latent Semantic Indexing Topic 1 Topic 2  80 latent topics  Topic weights
  4. 252 tokens Pipeline and Algorithm Data Collection 2,816 DS SE

    Con SM Natural Language Processing Latent Semantic Indexing Topic 1 Topic 2 Random Forest Classifier …….  Tokenization  Stemming  Stopwords  POS Tagging  Nouns only  TF-IDF  80 latent topics  Topic weights
  5. 252 tokens Pipeline and Algorithm Data Collection 2,816 DS SE

    Con SM Natural Language Processing Latent Semantic Indexing Topic 1 Topic 2 Random Forest Classifier ……. Data Scientist 74% Keywords  Tokenization  Stemming  Stopwords  POS Tagging  Nouns only  TF-IDF  80 latent topics  Topic weights
  6. Consultants are all over the place: IT, strategy… Data Scientist

    Strategy Manager Software Engineer Consultant
  7. Top Topics The 0th top topic is:(0, u'0.184*"marketing" + 0.143*"product"

    + 0.142*"business" + 0.138*"strategy" + 0.132*"software" + 0.116*"project" + 0.114*"customer" + 0.112*"manager" + 0.109*"analysis" + 0.108*"market" + 0.106*"management" + 0.105*"application" + 0.102*"client" + 0.100*"development" + 0.099*"research" + 0.099*"test" + 0.093*"system" + 0.092*"team" + 0.089*"process" + 0.088*"design" + 0.087*"engineer" + 0.086*"company" + 0.084*"program" + 0.080*"support" + 0.080*"performance" + 0.078*"service" + 0.078*"revenue" + 0.078*"web" + 0.078*"sql" + 0.075*"technology" + 0.074*"analyst" + 0.073*"model" + 0.071*"scientist" + 0.071*"quality" + 0.070*"university" + 0.069*"time" + 0.069*"brand" + 0.069*"information" + 0.068*"server" + 0.068*"consultant" + 0.067*"implementation" + 0.067*"risk" + 0.067*"database" + 0.066*"experience" + 0.064*"production" + 0.063*"planning" + 0.063*"java" + 0.062*"training" + 0.061*"growth" + 0.061*"group"')
  8. Top Topics The 1th top topic is:(1, u'0.374*"marketing" + 0.224*"strategy"

    + -0.197*"software" + - 0.187*"application" + -0.163*"engineer" + 0.159*"brand" + 0.157*"market" + -0.153*"test" + - 0.142*"java" + 0.130*"product" + -0.130*"server" + 0.128*"manager" + -0.123*"sql" + - 0.116*"web" + 0.114*"revenue" + -0.106*"oracle" + 0.101*"growth" + -0.100*"system" + - 0.096*"code" + -0.094*"developer" + -0.092*"javascript" + -0.090*"spring" + -0.089*"j" + - 0.085*"framework" + -0.085*"design" + -0.081*"database" + -0.081*"jquery" + 0.079*"customer" + 0.078*"advertising" + -0.078*"html" + 0.075*"consumer" + 0.073*"pricing" + 0.071*"campaign" + -0.070*"environment" + 0.069*"program" + 0.068*"director" + - 0.065*"c" + 0.064*"year" + -0.063*"mvc" + -0.062*"c++" + 0.061*"launch" + 0.061*"business" + -0.060*"ee" + -0.060*"net" + 0.060*"budget" + -0.060*"cs" + -0.056*"interface" + - 0.055*"description" + 0.055*"leadership" + -0.055*"architecture"')
  9. Top Topics The 2th top topic is:(2, u'0.239*"scientist" + -0.227*"marketing"

    + 0.192*"research" + 0.190*"machine" + 0.188*"university" + 0.171*"r" + 0.151*"analysis" + 0.146*"algorithm" + 0.146*"science" + 0.144*"sa" + 0.136*"model" + -0.136*"application" + 0.134*"mining" + 0.127*"regression" + 0.119*"risk" + -0.118*"product" + -0.114*"brand" + -0.110*"test" + 0.108*"modeling" + 0.105*"python" + -0.101*"strategy" + 0.099*"assistant" + - 0.099*"software" + -0.097*"web" + 0.089*"learning" + 0.087*"visualization" + - 0.087*"engineer" + 0.087*"decision" + 0.086*"matlab" + 0.086*"hadoop" + 0.075*"fraud" + 0.074*"graduate" + 0.073*"classification" + 0.072*"data" + -0.071*"jquery" + -0.070*"spring" + 0.069*"analyst" + -0.068*"javascript" + -0.065*"java" + -0.063*"oracle" + 0.063*"simulation" + 0.063*"detection" + 0.061*"tableau" + 0.061*"series" + -0.061*"server" + 0.059*"state" + - 0.059*"development" + 0.057*"prediction" + -0.057*"mvc" + -0.056*"developer"')