Upgrade to Pro — share decks privately, control downloads, hide ads and more …

week4Demo

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for hzheng2015 hzheng2015
February 11, 2016

 week4Demo

v5 of demo

Avatar for hzheng2015

hzheng2015

February 11, 2016
Tweet

Other Decks in Science

Transcript

  1. Need a way to check professional content Want to be

    a Data Scientist? How to write a good resume?
  2. Pipeline and Algorithm Data Collection 2,816 DS SE Con SM

    Natural Language Processing  Tokenization  Stemming  Stopwords  POS Tagging  Nouns only  TF-IDF 252 tokens
  3. 252 tokens Pipeline and Algorithm Data Collection 2,816 DS SE

    Con SM Natural Language Processing  Tokenization  Stemming  Stopwords  POS Tagging  Nouns only  TF-IDF Latent Semantic Indexing Topic 1 Topic 2  80 latent topics  Topic weights
  4. 252 tokens Pipeline and Algorithm Data Collection 2,816 DS SE

    Con SM Natural Language Processing Latent Semantic Indexing Topic 1 Topic 2 Random Forest Classifier …….  Tokenization  Stemming  Stopwords  POS Tagging  Nouns only  TF-IDF  80 latent topics  Topic weights
  5. 252 tokens Pipeline and Algorithm Data Collection 2,816 DS SE

    Con SM Natural Language Processing Latent Semantic Indexing Topic 1 Topic 2 Random Forest Classifier ……. Data Scientist 74% Keywords  Tokenization  Stemming  Stopwords  POS Tagging  Nouns only  TF-IDF  80 latent topics  Topic weights
  6. Consultants are all over the place: IT, strategy… Data Scientist

    Strategy Manager Software Engineer Consultant
  7. Top Topics The 0th top topic is:(0, u'0.184*"marketing" + 0.143*"product"

    + 0.142*"business" + 0.138*"strategy" + 0.132*"software" + 0.116*"project" + 0.114*"customer" + 0.112*"manager" + 0.109*"analysis" + 0.108*"market" + 0.106*"management" + 0.105*"application" + 0.102*"client" + 0.100*"development" + 0.099*"research" + 0.099*"test" + 0.093*"system" + 0.092*"team" + 0.089*"process" + 0.088*"design" + 0.087*"engineer" + 0.086*"company" + 0.084*"program" + 0.080*"support" + 0.080*"performance" + 0.078*"service" + 0.078*"revenue" + 0.078*"web" + 0.078*"sql" + 0.075*"technology" + 0.074*"analyst" + 0.073*"model" + 0.071*"scientist" + 0.071*"quality" + 0.070*"university" + 0.069*"time" + 0.069*"brand" + 0.069*"information" + 0.068*"server" + 0.068*"consultant" + 0.067*"implementation" + 0.067*"risk" + 0.067*"database" + 0.066*"experience" + 0.064*"production" + 0.063*"planning" + 0.063*"java" + 0.062*"training" + 0.061*"growth" + 0.061*"group"')
  8. Top Topics The 1th top topic is:(1, u'0.374*"marketing" + 0.224*"strategy"

    + -0.197*"software" + - 0.187*"application" + -0.163*"engineer" + 0.159*"brand" + 0.157*"market" + -0.153*"test" + - 0.142*"java" + 0.130*"product" + -0.130*"server" + 0.128*"manager" + -0.123*"sql" + - 0.116*"web" + 0.114*"revenue" + -0.106*"oracle" + 0.101*"growth" + -0.100*"system" + - 0.096*"code" + -0.094*"developer" + -0.092*"javascript" + -0.090*"spring" + -0.089*"j" + - 0.085*"framework" + -0.085*"design" + -0.081*"database" + -0.081*"jquery" + 0.079*"customer" + 0.078*"advertising" + -0.078*"html" + 0.075*"consumer" + 0.073*"pricing" + 0.071*"campaign" + -0.070*"environment" + 0.069*"program" + 0.068*"director" + - 0.065*"c" + 0.064*"year" + -0.063*"mvc" + -0.062*"c++" + 0.061*"launch" + 0.061*"business" + -0.060*"ee" + -0.060*"net" + 0.060*"budget" + -0.060*"cs" + -0.056*"interface" + - 0.055*"description" + 0.055*"leadership" + -0.055*"architecture"')
  9. Top Topics The 2th top topic is:(2, u'0.239*"scientist" + -0.227*"marketing"

    + 0.192*"research" + 0.190*"machine" + 0.188*"university" + 0.171*"r" + 0.151*"analysis" + 0.146*"algorithm" + 0.146*"science" + 0.144*"sa" + 0.136*"model" + -0.136*"application" + 0.134*"mining" + 0.127*"regression" + 0.119*"risk" + -0.118*"product" + -0.114*"brand" + -0.110*"test" + 0.108*"modeling" + 0.105*"python" + -0.101*"strategy" + 0.099*"assistant" + - 0.099*"software" + -0.097*"web" + 0.089*"learning" + 0.087*"visualization" + - 0.087*"engineer" + 0.087*"decision" + 0.086*"matlab" + 0.086*"hadoop" + 0.075*"fraud" + 0.074*"graduate" + 0.073*"classification" + 0.072*"data" + -0.071*"jquery" + -0.070*"spring" + 0.069*"analyst" + -0.068*"javascript" + -0.065*"java" + -0.063*"oracle" + 0.063*"simulation" + 0.063*"detection" + 0.061*"tableau" + 0.061*"series" + -0.061*"server" + 0.059*"state" + - 0.059*"development" + 0.057*"prediction" + -0.057*"mvc" + -0.056*"developer"')