Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Business Intelligence Seminar

Pacmann AI
August 10, 2019

Business Intelligence Seminar

A brief of Business Intelligence that enables you to access and analyze information so you can improve and optimize your business decision and performance.

Pacmann AI

August 10, 2019
Tweet

More Decks by Pacmann AI

Other Decks in Programming

Transcript

  1. A brief of Business Intelligence that enables you to access

    and analyze information so you can improve and optimizing your business decision and performance. WHAT
  2. The number of possible solutions is so large that it

    precludes a complete search for the best answer. 01 The problem exists in a time-changing environment. 02 The problem is heavily constrained 03 There are many (possibly conflicting) objectives 04
  3. The necessary data were not recorded. Incomplete Information 1 The

    data are not reliable. Uncertainty 3 The data contain rounded figures and estimates. Noisy Data 2
  4. Academic Research (Social Science) Theory Hypothesis Data Exploration Metrics Insight

    Business Intelligence Data Exploration Insight Hypothesis Metrics Test Decision
  5. Motivation “People in both fields operate with beliefs and biases.

    To the extent you can eliminate both and replace them with data, you gain a clear advantage” Michael Lewis, Moneyball: The Art of Winning an Unfair Game
  6. Variables: • Team • League • Year • Runs Scored

    (RS) • Runs Allowed (RA) • Wins (W) • On-Base Percentage (OBP) • Slugging Percentage (SLG) • Batting Average (BA) • Playoffs (binary) • RankSeason • RankPlayoffs • Games Played (G) • Opponent On-Base Percentage (OOBP) • Opponent Slugging Percentage (OSLG) Moneyball Case: Intelligence Part 1 Collecting Data
  7. Moneyball Case: Intelligence Part 2 Information Processing How does a

    team make it to playoffs? To be exact, how many games did it take to make it to playoffs? Target Wins: 95
  8. Moneyball Case: Intelligence Part 3 Gaining More Knowledge How many

    more runs do we need to score than we allow in order to win 95 games in the regular season? Run Differential (RD) = Run Scored (RS) - Run Allowed (RA)
  9. Moneyball Case: Intelligence Part 4 Build a Metrics To achieve

    the goal, we need to measure how much [metrics] do we need for Run Scored, Runs Allowed, and Run Differential?
  10. Moneyball Case: Intelligence Part 4 Build a Metrics W =

    80.8814 + 0.1058(RD) RD = (W - 80.8814) / 0.1058 Replace W with 95 RD = (95 - 80.8814) / 0.1058 RD ≈ 133
  11. What OBP and SLG do we need to achieve a

    run differential of +133? The OOBP and OSLG for the A's in 2001 were: • OOBP = 0.315 • OSLG = 0.384 So the estimated RA ≈ 662 The actual value of runs allowed in the 2002 season was 654. Moneyball Case: Intelligence Part 5 Optimal Decision Support
  12. Moneyball Case: Intelligence Part 5 Optimal Decision Support What OOBP

    and OSLG do we need to achieve a run differential of +133? The OBP and SLG for the A's in 2001 were: • OBP = 0.339 • SLG = 0.432 So the RA ≈ 808 The actual value of runs scored in the 2002 season was 800.
  13. Moneyball Case: Intelligence Part 5 Optimal Decision Support Actual RD

    = 800 - 654 = 156 W = 80.8814 + 0.1058(RD) W = 80.8814 + 0.1058(156) W ≈ 96 The actual value of games won in the 2002 season was 103.
  14. Data-Driven for Decision Making The ability to use existing data

    in a new way or obtain data to make decisions with confidence that creates meaningful change Problem Decision PROBLEM SOLVING
  15. What is Business Intelligence? Business intelligence (BI) is a set

    of theories, methodologies, architectures, and technologies that transform raw data into meaningful and useful information for business purposes. 3 2 1 Lots and lots of data 01 Processing and Aggregation 02 Insight & Visualization 03
  16. Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics Information Optimization

    What Happened? Why did it happened? What will happen? How can we make it happen? Early Stage of company Amount of data Analytics & Company Maturity Seed Growth Beyond
  17. Customer Development Cycle A company is a sequence of hypothesis

    testing. https://xweb.stanford.edu/group/e145/cgi-bin/wi nter/drupal/upload/handouts/Four_Steps.pdf
  18. Surveys can provide timely data using specific survey questions. “Designing

    and Conducting Business Survey” - Ger Snijkers, Gustav Haraldsen, Jacqui Jones. 2013) Getting Your First Data: Survey
  19. Getting Your First Data: Survey General Flow Business Intelligence Design

    Scope, variables Build sample, questions Test Launch, processing, get insight
  20. Getting Your First Data: FGD To identify a range of

    perspectives on some topic/issue, to gain an understanding of the topic/issue from the perspective of participants themselves
  21. Getting Your First Data: FGD 1 6 2 3 4

    5 Explore 2 Group Process Gain Diversity 4 Explain 5 Evaluate 6 Design
  22. The market research approach typically uses focus group discussions to

    gain consumer views on new products or marketing campaigns (Kroll et al., 2007; Bloor et al., 2001). Getting Your First Data: FGD
  23. Problems • Your opinion/belief is only a hypothesis. • The

    obvious things based on your opinion might be a fallacy. • The first person you can be easily fooled is yourself. • In order to falsify your opinion or belief, you can gather data and compare it as a source of “truth”. Getting Your First Data: Case
  24. Case I : Motorola Motorola’s Iridium satellite-based phone system. Engineering

    triumph and built to support a customer base of millions. No one asked the customer if they wanted it. Cost $5 billion. Yes, billion. Satellites are awfully expensive. Getting Your First Data: Case
  25. Case II : Smokeless Cigarettes R.J. Reynolds’ Premier and Eclipse

    smokeless cigarettes. Understood what the general public (nonsmokers) wanted, but did not understand that their customers didn’t care. Cost: $450 million Getting Your First Data: Case
  26. Case III : Toothpaste A toothpaste company want to increase

    their sales. Q: But How? Let’s do a FGD. Focus Group Discussion Person 1: I like the aluminium tube. The toothpaste smells nice. Person 2: The price is affordable. I can easily squeeze the aluminium tube. Person 3: It taste good. My child might eat it. Getting Your First Data: Case
  27. Case III : Toothpaste Design a new tube, made of

    plastic, so users can’t squeeze all the toothpaste, and buy more. Getting Your First Data: Case
  28. Getting Your First Data Conclusion You need to gather your

    first data, either with FGD or Survey or secondary source and use it to falsify your opinion. Data triumph intuition!
  29. Problems • You already have data from survey/FGD. • You

    want to understand your user behavior, gain insight from your data. • You want to find patterns in your data. • These patterns can be used to form hypothesis on how to optimize our objectives. Data Exploration and Visualization
  30. Case: NYC Taxi Trip In this competition, Kaggle is challenging

    you to build a model that predicts the total ride duration of taxi trips in New York City. Your primary dataset is one released by the NYC Taxi and Limousine Commission, which includes pickup time, geo-coordinates, number of passengers, and several other variables.
  31. How • I will follow George Polya’s “How to Solve

    It” heuristics: Understand your problem. • Ask questions! ◦ What are you asked to find or show? ◦ Can you think of a diagram that might help you understand the problem? ◦ Is there enough information to enable you to find a solution? ◦ Do you understand all the data used in this problem? • Do visualizations! Data Exploration and Visualization
  32. Conclusion Data understanding from exploration and visualization make us now

    the behavior of our users. These understanding can be formed into hypothesis, and we can test those hypothesis to optimize our decision/objectives. Data Exploration and Visualization
  33. Hypothesis A hypothesis is a tentative, provisional, or unconfirmed statement

    derived from theory/ intuitions that can be either verified or falsified.
  34. • The questions we ask are more important than the

    things we measure. • I will follow Judea Pearl hypothesis making mechanism. • We will represent hypothesis as “How to Solve It” heuristics mechanism to find a good set of hypothesis. Hypothesis
  35. Case: NYC Taxi Trip In this competition, Kaggle is challenging

    you to build a model that predicts the total ride duration of taxi trips in New York City. Your primary dataset is one released by the NYC Taxi and Limousine Commission, which includes pickup time, geo-coordinates, number of passengers, and several other variables.
  36. Introduction to Business Metrics Measurement is the act of determining

    a quantitative indication of the extent, amount, dimension, capacity, or size of some attribute of a product or process (Pressman, 2000) Metrics is a quantitative measure of the degree to which a system, component, or process possesses a given attribute. (IEEE, 1990)
  37. Business Metrics: User Metrics Why: - To determine if one

    methodology produces a faster result than another - Identify hot leads - Improve marketing campaign effectiveness - Determine which marketing campaigns lead to the most profitable customers - Discover which features are getting the most/least use - Reveal technical problems which are hindering your service
  38. Business Metrics: User Activity Metrics Metrics Examples: - Churn Rate

    - DAU/MAU Ratio - Adoption Rate - Lifetime Value - Screen Flow - Completion Rate - Time Based Efficiency - Overall Relative Efficiency
  39. One single most important User Activity Metrics: - Retention Business

    Metrics: User Activity Metrics % active (monthly) Days from acquisition
  40. Business Metrics: User Activity Metrics When this thing happen: -

    You probably don’t have a product market fit. - Revise and build a better product. - Your churn rate is a big problem. - You have a loyal customer. - The parallel line of retention with X axis show it converge to a number. - Don’t make yourself biased - 20% of retention, in airlines market, in Indonesia, in daily churn might be big number. People probably only use airplane 2 times a year on average.
  41. Business Metrics: User Activity Metrics When this thing happen: -

    You probably have a product market fit. - Your churn rate is not a big problem. - You have a loyal customer. - The parallel line of retention with X axis show it converge to a number. - Don’t make yourself biased - 95% retention of Nasi Goreng, in NY, 100 people is a small market.
  42. Business Metrics: User Activity Metrics When this thing happen: -

    Marketing might help you in the short run, but the retention will converge to its natural rate when the marketing campaign gone. - Yes, it can prolong your product life cycle. (and suffering) - Still, the time you buy with marketing campaign need to be translated into a new better product, so the users might stay.
  43. Problems • We want to measure the effect of our

    product/hypothesis/experiment on some business metrics. • If the effect is large enough, it can increase our metrics and achieve our business objectives. A/B Testing
  44. Optimize Decision Making: A/B Testing Case : Scurvy • First

    controlled experiment / randomized trial for medical purposes • Scurvy is a disease that results from vitamin C deficiency • It killed over 100,000 people in the 16th-18th centuries, mostly sailors • Lord Anson’s circumnavigation voyage from 1740 to 1744 started with 1,800 sailors and only about 200 returned;most died from scurvy • Dr. James Lind noticed lack of scurvy in Mediterranean ships. • Gave some sailors limes (treatment), others ate regular diet (control)
  45. Problems Our objective is to choose a range of values

    to optimize business metrics. How Step 1: Get data. Step 2: Define a loss function. Step 3: Build a model with a tunable parameter. Step 4: Optimize the loss function given a range of parameter values. Step 5: Choose the best parameter. Decision Making
  46. Case: Let’s say I am trying to decide a price

    at which to list a used phone I want to sell. In this case I may denote my decision space as the entire positive real line such that a∈[0,+∞) . Decision Making
  47. Step 2: Define a loss function. how do we figure

    out the loss associated with individual decisions when we don’t even know the information we want to use to make a decision? The answer is that we turn to probability theory and instead calculate the “Expected Loss” we would feel if we choose a given action given our beliefs (our probability distribution) about θ Decision Making
  48. Decision Making Conclusion Bayesian Decision Making is important to optimize

    our decision to choose the best parameter on a large set of values.
  49. Customer Development Cycle A company is a sequence of hypothesis

    testing. https://xweb.stanford.edu/group/e145/cgi -bin/winter/drupal/upload/handouts/Four_ Steps.pdf
  50. 1. Do FGD/ Customer Survey 2. Do Data exploration 3.

    Build a prototype 4. Test the prototype to users Customer Development Cycle Product Hypothesis: Do they need a checkins-photo?? Data Exploration Insight Hypothesis Metrics Test Decision
  51. 1. Do FGD/ Customer Survey 2. Do Data exploration 3.

    Build a prototype 4. Test the prototype to users Customer Development Cycle Market Hypothesis: Do we need to focus on photo only? Data Exploration Insight Hypothesis Metrics Test Decision
  52. 1. Do FGD/ Customer Survey 2. Do Data exploration 3.

    Build a prototype 4. Test the prototype to users Customer Development Cycle Feature Question: What kind of feature do they want to make the users use our product? Data Exploration Insight Hypothesis Metrics Test Decision
  53. 1. Do FGD/ Customer Survey 2. Do Data exploration 3.

    Build a prototype 4. Test the prototype to users Customer Development Cycle Marketing Question: Do we need to rebrand our app? Data Exploration Insight Hypothesis Metrics Test Decision
  54. Pacmann AI Classes Quality State of the Art of Machine

    Learning Research Practical Skills Theoretical Understanding > 50 institutions 400++ alumni 6 Classes in the past [email protected] https://pacmann.ai