Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ben Zamanzadeh - Data Science @ Datapop - LA Data Science meetup - January 2015

Data Science LA
January 20, 2015

Ben Zamanzadeh - Data Science @ Datapop - LA Data Science meetup - January 2015

Data Science LA

January 20, 2015

More Decks by Data Science LA


  1. 2 • Distributed Store Front (Omni-Channel Retail) – Official Site, Search &

    PLA, Mobile, Social, Affiliates, Ad Networks, Shopping Sites, Digital Catalogs, Streaming Networks, Gaming & TV Consoles, Brick2Click, etc. • Digital Merchandising – Manage online Perception of Products & Brands • Smart Catalog – Integrate Knowledge of Consumer with Catalog • Rapidly Changing Landscape – Dynamic Market Conditions, – New Advertising methods, New Channels, – More Sophisticated competition • Large & Complex Operational & Data Scale Evolution of eCommerce
  2. 3 Dark Data Lack of visibility caused by: •  Big

    Dark Data •  Massive amount of noisy & complex data •  Waves in the Ocean Dilemma •  Disjointed operations •  Machine Learning Models lack of context •  Need Human to generate Actionable Insight
  3. 4 Improvement Cycle Convert Big Data into Knowledge Graph Generate

    Actionable Insights Semantic Advertising Beats Studio Over-Ear Headphones – Red – Beats by Dre
  4. 5 • Marketing Knowledge Graph – Marketing Knowledge Graph (MKG) consists of

    a semantic network of brands, products, retailers, consumer intent & sentiments in addition to the advertising performance statistics and history for all publisher’s and channels. – Marketing Knowledge Graph is a knowledge base used by DataPop’s Semantic Advertising to generate and enhance Advertising Campaigns. • SEMANTIC ADVERTISING – Semantic Advertising achieves meaningful and optimized advertising through the use of semantic networks linking brands, products, retailers, to the consumer intent and sentiments using performance metrics (AKA Marketing Knowledge Graph). Marketing Knowledge Graph & Semantic Advertising
  5. 6 • Search Knowledge Graph – Used for Semantic Search: It is

    used by search ranking algorithms to return most relevant results for search queries. Also it is intended to engage Searcher to explore Knowledge Graph for better answers. KG is a loosely connected network of vast number of entities spread around a very wide range of topics. • Marketing Knowledge Graph – Used for online Marketing & Advertising of Products and Services. It is used by data mining and analysis systems as well as advertising campaign management systems. – MKG is a tightly coupled semantic network, which is limited to products, brands, consumer intent, retailers and various advertising channels & publishers. Advertising performance is a major component of MKG, which changes rapidly due to changes in consumer behavior and eCommerce competitive landscape. •  Difference Between MKG and Search Knowledge Graph
  6. 9 • Named Entity Recognition – Conditional Random Fields – Support Vector Machine

    • Supervised Category Recognition –  Support Vector Machine –  Max Entropy –  Naïve Bayes –  Ensemble Methods • Unsupervised Linguistic Category Derivation • Topic Modeling, Explicit Semantic Analysis – Semantic Relatedness –  Semantic “Area Density”, Semantic Distance Our R&D Focus : Natural Language Processing & Machine Learning
  7. 10 • Max Entropy – Bag of words and features, Minimize un-observed

    assumptions – Maximize Information Entropy : Closest to uniformity • Advantages – Good for large set of categories – Fairly Fast predictor – Works better with mildly noisy data – Does not assume independence of words in the bag of words – Worked well with 3 M training set – Confidence score has almost linear relationship with F1 • Disadvantages – Slower training Comparison of Statistical Methods
  8. 12 • Support Vector Machines – Vector based solutions, using Cosine Similarity

    – Establish support vectors that isolate each category hyper-plane • Advantages – Strong learner (usually best in Vector based) – Fast training and prediction – F1 similar to Max Ent (slightly below) – Good for text, image, bioinformatics • Disadvantages – Complex training, Optimization – Scores not as useful – Sensitive to noise – Better with longer documents, more signal – Better with smaller number of classes Comparison of Statistical Methods
  9. 13 • Naïve Bayes – Bag of words, Statistical counting • Disadvantages: – Weaker

    learner at 3 M samples – Assumed words are independent in bag of words (not true) • Advantages – Adaptive • Recent Research on Training Data Size Comparison of Statistical Methods
  10. 14 ü  “Good Data” is the King ü  Unsupervised ü 

    Non-Human Supervised ü  Max Entropy ü  SVM ü  Ensemble ü  CRF ü  Mix of Rules & ML What Doesn’t Work Ⓧ Naïve Bayes Ⓧ Human Gen Supervised Ⓧ Standalone methods Ⓧ Just use ML methods Ⓧ Biased Data Ⓧ Noisy Data Named Entity Recognition & Categorization What Works
  11. 16 • Lobbyist Objective –  Lobbyist gets five minutes on the

    senate floor to address senators –  Lobbyist is given 5 different subjects that he can pitch to the senators –  He can only pitch one of the subjects –  Does not know who is in room ahead of time, senators are scattered in the room and are busy talking to each other in groups. • He is given few seconds and a fast laptop –  Either select a group of senators and join them to present his case (any of the 5). He has to choose the most influential group that also is most interested in one of the subjects. –  Or go on the podium and address all senators and take the risk that he may loose all of the audience quickly if majority of room is not interested. • What he knows about each senator –  Level of influence, subcommittee membership, Past voting history –  What subjects senator is interested (from 1000’s of possible subjects) –  What each group is talking about right now (conversation subject) –  He can also see who is in each group (Floor map) Statement of problem: Lobbyist on the Senate Floor