Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Techniques (Tricks) for Data Mining Competitions

@smly
October 16, 2015

Techniques (Tricks) for Data Mining Competitions

@smly

October 16, 2015
Tweet

More Decks by @smly

Other Decks in Technology

Transcript

  1. Techniques (Tricks) for
    Data Mining Competitions
    Kohei Ozaki
    2015-10-15 @ Kyoto University

    View Slide

  2. Kohei Ozaki
    Kaggle Enthusiast
    Work Experience:
    •  Insurance Fraud Detection
    •  Predictive Modeling for Online Advertising
    •  Recommendation System for SNS
    •  etc.
    (screenshot on https://www.kaggle.com/confirm)
    2

    View Slide

  3. Agenda
    Data Mining Competitions
    Techniques (Tricks) for competitions
    Learning from Winning Solutions
    Trend on Kaggle
    3
    4 - 12
    13 - 28
    29 - 72
    73 - 78

    View Slide

  4. Agenda
    Data Mining Competitions
    Techniques (Tricks) for competitions
    Learning from Winning Solutions
    Trend on Kaggle
    4
    4 - 12
    13 - 28
    29 - 72
    73 - 78

    View Slide

  5. Data Mining Competitions
    Participants compete their score of predictive model.
    A competition normally runs for 2 or 3 months.
    Many kind of tasks/datasets in real world.
    Insurance, Credit Scoring, Loan default, Medical, EEG, MEG,
    Image Classification, HealthCare, High-energy physics,
    Social Good, Marketing, Advertising, Trajectory, Telematics,
    etc…
    5

    View Slide

  6. Step1: Get the Data
    Download datasets and Understand competitions task
    6

    View Slide

  7. Step2: Make a Submission
    Create your model and make a submission
    7

    View Slide

  8. Step3: Check Your Rank
    After you make the submission, your models are evaluated
    immediately and ranked on the Public Leaderboard.
    8

    View Slide

  9. Huge Amount of Prize Pool
    Netflix Prize 2009 ($1M)
    Recommend movies
    Heritage Health Prize 2011 ($3M)
    Predict days in hospital
    GE Flight Quest Challenge
    Part 1: Predict gate/arrival time 2012 ($250k)
    Part 2: Optimize flight plan 2014 $(220k)

    Predic)ve
    Modeling
    World
    Yet
    Another
    World

    DARPA Grand Challenge ($2M)
    Autonomous vehicle
    Google Lunar XPRIZE ($30M)
    Autonomous robotic spacecraft
    9

    View Slide

  10. Who Host Competitions?
    Kaggle is a platform for data prediction competition.
    (cloud sourcing community of 360k+ data scientists)
    In addition to prize money, many data scientists use Kaggle
    to learn and collaborate with experts.
    10

    View Slide

  11. Great Place to Try out Your Ideas (1/2)
    Many researchers/developers also use Kaggle.
    (XGBoost, LibFM, LibFFM, Lasagne, Keras, cxxnet, etc…)
    Our original motivation for entering the contest
    was to try out our new tree ensemble
    regularized greedy forest (RGF)
    in a competitive setting.
    Rie Johnson (RJ Research Consulting)
    an prize winner in Heritage Health Prize

    (quote from h;p://www.heritagecaliforniaaco.com/?p=hpn-today&ar)cle=45)

    View Slide

  12. Great Place to Try out Your Ideas (2/2)
    Many researchers/developers also use Kaggle.
    (XGBoost, LibFM, LibFFM, Lasagne, Keras, cxxnet, etc…)
    My intention of participating in this competition is
    to evaluate the performance of recurrent convolutional
    neural network (RCNN) in processing time series data.
    Ming Liang (Tsinghua University)
    an prize winner in Grasp-and-Lift EEG Detection
    (quote from h;ps://www.kaggle.com/c/grasp-and-liL-eeg-detec)on/forums/t/16617/team-daheimao-solu)on)

    View Slide

  13. Agenda
    Data Mining Competitions
    Techniques (Tricks) for competitions
    Learning from Winning Solutions
    Trend on Kaggle
    13
    4 - 12
    13 - 28
    29 - 72
    73 - 78

    View Slide

  14. Two Main Factors
    The quality of the individual model & the ensemble idea.
    No sophisticated individual models, no victory.
    Both of the individual model & the ensemble idea are keys.
    14
    Individual Model
    Ensemble Model

    View Slide

  15. Hyper Parameter Tuning &
    Feature Engineering
    Read Owen Zhang’s slide (textbook) carefully :-)
    http://www.slideshare.net/OwenZhang2/tips-for-data-science-competitions
    15

    View Slide

  16. Greedy Forward Selection (GFS)
    Greedy Forward Selection (GFS) is simple and works well
    for feature selection and model selection on ensemble.
    1: Initialize feature set Fk = at k = 0
    Fk
    k = k + 1 Fk = Fk 1 [ {j}
    2: Iterate
    3: Find best feature to add to
    with most significant cost reduction.
    4: and
    j /
    2 Fk
    16

    View Slide

  17. GBDT: RGF-L2 and XGBoost have
    L2-Regularization for Leaf Coefficient
    L2 regularization works great on noisy dataset & ensemble model.
    Parameter
    Regression tree
    (CART)
    +8.0 -0.2 +8.0 -0.2 +8.0 -0.2
    -0.2
    +8.0 -0.2
    ˆ
    yi =
    f1(
    xi)
    f2(
    xi)
    fK(
    xi)
    · · ·
    + + +
    Obj(⇥) = l(⇥) + ⌦(⇥)
    ⇥ = {f1, f2, · · · , fK
    }
    Objective
    Loss term Regularization term
    (heuristics including L0 (# of leaves) and L2)
    ⌦(⇥)
    17

    View Slide

  18. Reminder: 5-Fold Cross Validation
    Use K-1 parts for training and 1part for testing.
    18

    View Slide

  19. Ensemble Techniques: Stacking (1/2)
    Stacking uses different methods’ predictions as “meta-features”.
    To obtain meta-features for training of ensemble model,
    use K-1 parts for training and 1 part for making a meta-feature.
    1D Meta-feature

    View Slide

  20. Ensemble Techniques: Stacking (2/2)
    You can stack more stages :-)

    View Slide

  21. Netflix Blending (Quiz Blending)
    [1] Andreas Töescher and Michael Jahrer
    “The BigChaos Solution to the Netflix Grand Prize”.
    Assume that the task is regression
    and the prediction is evaluated by RMSE.
    What can we do for improving our score?
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    21

    View Slide

  22. zoom-in
    22

    View Slide

  23. Actual setting:
    We have feedback of quiz data (30% of test data)!
    23

    View Slide

  24. Utilize Quiz feedback for blending 1/4
    Our goal is to find the linear combination of predicted results
    that best predicts y (target variable).
    N-by-p matrix
    predictions combined with p individual models
    ( )
    =
    be the unobserved vector of true target values.
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    24

    View Slide

  25. Utilize Quiz feedback for blending 2/4
    If y is known, then best estimation by linear combination is:
    N-by-p matrix
    predictions combined with p individual models
    be the unobserved vector of true target values.
    25

    View Slide

  26. ( )
    Utilize Quiz feedback for blending 3/4
    If y is known, then best estimation by linear combination is:
    =
    ( )




    j-th element
    =
    =




    Can be approx.
    (All zero case)
    Can be computed
    exactly.
    Can be approx. by
    using quiz feedback.
    (N times MSE) 26

    View Slide

  27. Utilize Quiz feedback for blending 4/4
    Our goal is to find the linear combination of predicted results
    that best predicts y (target variable).
    Prediction
    Linear combination by using quiz feedback:
    ( )・ β =
    N-by-p matrix
    predictions combined with p individual models
    (p x 1) matrix
    weight parameters
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    Predic)on
    27

    View Slide

  28. OT: Amazon’s AWS for Modeling
    c4.8xlarge (36 CPU cores with 64 GB RAM / $0.3 per hour)
    My bagging GBDT model for KDDCup takes 6 hrs (= $1.8)
    * The above price is for spot-instance of us-west-1c on Oct 2015. The price is dynamically changing.
    S 280yen (= $2.3)
    28

    View Slide

  29. Agenda
    Data Mining Competitions
    Techniques (Tricks) for competitions
    Learning from Winning Solutions
    Trend on Kaggle
    29
    4 - 12
    13 - 28
    29 - 73
    74 - 78

    View Slide

  30. Learn from Winning Solutions
    Today’s talk describes following competitions:
    Competition Name Description
    KDD Cup 2015 Binary Classification, Access log
    GE Flight Quest 2 Optimization
    Grasp-and-Lift EEG Detection Multi-class Classification,
    BCI, EEG recordings
    30

    View Slide

  31. About KDD Cup 2015
    Annual and most prestigious competition in data-mining.
    821 teams joined.
    Task:
    Predict the probability that a student will drop-out course
    in 10 days. The dataset is provided by XuetangX, one of
    the largest MOOC platforms in China.
    Date
    # of access records
    Drop-out course or not
    31

    View Slide

  32. Winner: InterContinental Ensemble
    Jeong Mert Andreas Michael Xiaocong
    Peng
    Kohei
    Tam
    Song

    View Slide

  33. Dataset (1 of 3)
    Pair of for each enrollment_id.
    (1) Enrollment data
    (2) Access logs
    (3) Object attributes
    33

    View Slide

  34. Dataset (2 of 3)
    Application logs. Source, Event and Object ID are provided.
    (1) Enrollment data
    (2) Access logs
    (3) Object attributes
    34

    View Slide

  35. Dataset (3 of 3)
    Detailed information of Object ID.
    (1) Enrollment data
    (2) Access logs
    (3) Object attributes
    35

    View Slide

  36. Analyze User Activities
    Users who doesn’t access the course many times drop-out the
    course.
    #of access logs (for each enrollent_id)
    # of enrollment_id (histogram)
    36

    View Slide

  37. Analyze Last Access
    Obviously, users who recently accesses the course continue the
    course.
    37

    View Slide

  38. Initial Analysis
    Base Features
    User activity and last access make a big impact on the AUC score.
    Features Model 5-Fold CV
    (AUC)
    One Hot Encoding (course_id) GBDT 0.6118
    + num_records (User Activity) GBDT 0.8485
    + num_unique_object GBDT 0.8507
    + num_unique_active_days GBDT 0.8595
    + num_unique_active_hours GBDT 0.8601
    + num_unique_problem_event GBDT 0.8621
    + first and last timestamp (Last Access) GBDT 0.8821
    38

    View Slide

  39. Feature Engineering (MC)
    Multiple Courses Features
    Concept: Some user enrolled multiple courses.
    Date
    # of access records
    (by courses)
    Features Model 5-Fold CV
    (AUC)
    Base GBDT 0.8821
    + (MC) first and last timestamp for each user GBDT 0.8936
    + (MC) num_unique_active_days for each user GBDT 0.8946
    + (MC) num_enrollment_courses for each user GBDT 0.8953
    39

    View Slide

  40. Feature Engineering (EP)
    Evaluation Period Features (a bit leaky)
    Concept: The activities after the end date of courses.
    Features Model 5-Fold CV
    (AUC)
    Base + MC GBDT 0.8953
    Base + MC + EP GBDT 0.9027
    Date
    # of access records
    (by courses)
    40

    View Slide

  41. Feature Engineering (PXJ)
    Features from Teammates (Peng, Xiaocong and Jeong):
    •  Max absent days
    •  Min days from first visit to next course begin
    •  Min days from 10 days after last visit to next course begin
    •  Min days from last visit to next course end
    •  Min days from next course to last visit
    •  Min days from 10 days after course end to next course begin
    •  Min days from 10 days after course end to next course end
    •  Min days from course end to next visit
    •  Active days from last visit to course end
    •  Active days in 10 days from course end
    •  Average hour per day
    •  Course drop rate
    •  Time span
    Features Model 5-Fold CV
    (AUC)
    Base + MC + EP GBDT 0.9027
    Base + MC + EP + PXJ GBDT 0.9052
    41

    View Slide

  42. Last 48
    hours
    We’re
    in 3rd
    Place
    For
    Long
    Time

    View Slide

  43. Feature Engineering (LD)
    Label Dependent Features (a bit leaky)
    Count number of dropped-out courses for each days on
    evaluation period by using target variables in training set.
    Features Model 5-Fold CV
    (AUC)
    Base + MC + EP + PX1 + PX2 GBDT 0.9052
    Base + MC + EP + PX1 + PX2 + LD GBDT 0.9062
    Base + MC + EP + PX1 + PX2 + LD Bagging GBDT 0.9067
    43

    View Slide

  44. Last 27
    hours
    Add
    LD
    Feature
    Into
    Ensemble
    Model
    44

    View Slide

  45. Feature Engineering (TAM)
    Sliding window & Various aggregation + GFS (Tam’s work)
    Use sliding window to generate many features automatically.
    Features Model 5-Fold CV
    (AUC)
    Base + MC + EP + PXJ + LD GBDT 0.9062
    Base + MC + EP + PXJ + LD + TAM GBDT 0.9067
    Date
    # of access records
    (by courses)
    sliding window & various aggrega)on (by objects, events, etc.)
    45

    View Slide

  46. Last 8
    hours
    Add
    TAM’s
    Model
    Into
    Ensemble
    Model
    Last 4
    hours
    Add
    Tam’s
    Single
    Best
    Into
    Ensemble
    Model
    46

    View Slide

  47. Three-Stage Ensemble
    64 single + 15 ensemble + 2 ensemble + 1 blending
    Models 5-Fold CV (AUC)
    Single Best 0.9067
    Final model (Three-Stage Ensemble) 0.9082 47

    View Slide

  48. To Avoid Over-fitting
    Comparing LB and Local CV is important to avoid over-fitting.
    Warning!
    over-fi_ng
    48

    View Slide

  49. Team Framework/Guideline
    (1) We shared the index file of 5fold CV at first.
    (2) By using it, we uploaded the CV Prediction and Predicted
    result for test data to Dropbox.
    (3) Update the wiki to describe the CV score and LB score.
    Then, we all can contribute the ensemble/blending part.
    (If we didn’t use the same index of 5fold CV,
    our ensemble model should be over-fit.)
    49

    View Slide

  50. Summary
    Feature Engineering is one of the key point for winning.
    (Don’t give up a chance to improve your feature set)
    People can work together internationally.
    (The well-designed guideline is important to work as a team)
    50

    View Slide

  51. Grasp-and-Lift EEG Detection
    Task:
    Identify hand motions (multi-class) from
    time-series EEG records.
    51
    (Pic. is from https://www.kaggle.com/c/grasp-and-lift-eeg-detection/data)

    View Slide

  52. Dataset: EEG records
    32 channels EEG data
    6 events to detect (HandStart, FirstDigitTouch, LiftOff, …)
    52
    (Fig. is from https://www.kaggle.com/c/grasp-and-lift-eeg-detection/data and
    https://www.kaggle.com/acshock/grasp-and-lift-eeg-detection/how-noisy-are-these-eegs)

    View Slide

  53. Winners approaches
    1st place: Alexandre Barachant & Rafał Cycoń
    (expert in EEF & signal processing)
    •  Feature Extraction: Filter bank, Neural Oscillation, ERP
    •  Single Models: LR, LDA, RNN, CNN
    2nd place: Ming Liang (expert in image processing)
    •  Feature Extraction: Nothing
    •  Single Models: CNN, Recurrent CNN
    •  Model Selection: Greedy Forward Selection
    It seems the single best model on this contest is Recurrent CNN.
    CNNs can perform as well as traditional paradigm.
    53

    View Slide

  54. Classifying EEG signals with a
    Convolutional Neural Network
    A input sample is treated as height-1-images.
    The input sample at time t is composed of the n-dimensiontal
    data at times t - n + 1, t - n + 2, ..., t.
    54
    time t
    n-dimensional data

    View Slide

  55. Recurrent CNN
    Current state of the art algorithm on image classification task.
    55
    [4] Ming Liang and Xiaolin Hu, “Recurrent Convolutional
    Neural Network for Object Recognition”, CVPR’15.
    RCL (Recurrent Convolution Layer) is a natural integration of
    RNN and CNN. The feed-forward (blue line) and recurrent
    computation (red line) both take the form of convolution.
    (Fig. is from http://blog.kaggle.com/2015/09/29/grasp-and-lift-eeg-detection-winners-interview-2nd-place-daheimao/)

    View Slide

  56. Summary
    Convolutional Neural Network works great on
    time-series signal records (EEG).
    Don’t fear the experts!
    •  Non-expert ML researcher might beat an expert researcher.
    •  Google Scholar is your friend.
    56

    View Slide

  57. GE FQ2: Flight Route Optimization
    Objective: produce a flight plan for each flight to
    reduce the average cost of plains as low as possible.
    57
    (Pic. is from http://www.gequest.com/)

    View Slide

  58. Format of Flight Plan
    List of 4D (Latitude, Longitude, Altitude and Speed) points for
    each flight plan.
    1:Latitude
    2:Longitude
    3:Altitude
    4:Speed
    2013-10-02 12:00:00 (Cut-off time)
    1:Latitude and 2:Longitude
    3:Altitude
    4:Speed
    58
    (Fig. is from [3] Christian Kiss-Toth, Gabor Takacs, “A Dynamic Programming Approach for 4D Flight Route Optimization”)

    View Slide

  59. Evaluation Metric (1 of 2)
    Objective: produce a flight plan for each flight to
    reduce the average cost of plains as low as possible.
    C
    total
    = C
    fuel
    + C
    delay
    + C
    oscillation
    + C
    turbulence
    揺れ
    Penalty for
    changing al)tude
    乱気流
    Linear func)on of
    the elapsed )me in
    turbulent zones.
    59

    View Slide

  60. Evaluation Metric (2 of 2)
    C
    total
    = C
    fuel
    + C
    delay
    + C
    oscillation
    + C
    turbulence
    Evaluated by a flight simulator. A flight can take 3 kind of step:
    “ascending”, “descending” and “cruising”.
    Fuel consump)on is depend on
    the flight instruc)on.
    *Airspeed (IAS): 対気速度
    (the speed of an aircraft relative to the air).
    *Ground speed (GS): 対地速度
    60

    View Slide

  61. Dataset (1 of 3)
    Flight Information
    List of test flights to optimize.
    •  Arrival Airport
    •  Current Location
    •  Parameters of Cost Model
    61

    View Slide

  62. Dataset (2 of 3)
    Airport Locations
    produce a flight plan for each flight to reduce the
    average cost of plains as low as possible.
    62

    View Slide

  63. Dataset (3 of 3)
    Restricted Zones
    Airspace which is reserved for
    special use (restricted from
    civilian aircraft)
    Turbulent Zones
    Airspace where flights experience
    turbulence (accrue a USD cost for the
    time spent within these zones)
    Weather (Wind data)
    Vectors on 4-axes
    representation.
    (time, altitude, easting,
    northing)
    63
    (Fig. is from [3] Christian Kiss-Toth, Gabor Takacs, “A Dynamic Programming Approach for 4D Flight Route Optimization”)

    View Slide

  64. Analyze Cost Model
    C
    total
    = C
    fuel
    + C
    delay
    + C
    oscillation
    + C
    turbulence
    The burned fuel is a function of the airspeed,
    but the ground speed is the sum of the velocity relative to the
    air and wind vector.
    ˠ Taking advantage on the wind can significantly reduce
    the fuel cost and the delay cost.
    64
    *Airspeed (IAS): 対気速度
    (the speed of an aircraft relative to the air).
    *Ground speed (GS): 対地速度

    View Slide

  65. Example of Wind-Optimal Path
    The blue line is wind-optimal path
    (Reduce 15% of total cost from the red line).
    65
    (Fig. is from [3] Christian Kiss-Toth, Gabor Takacs, “A Dynamic Programming Approach for 4D Flight Route Optimization”)

    View Slide

  66. 5th Solution (1 of 5)
    [3] Christian Kiss-Toth, Gabor Takacs, “A Dynamic
    Programming Approach for 4D Flight Route Optimization”
    Procedure:
    (1) Create an Initial Routes
    (2) 2D optimization process (latitude and longitude)
    (3) Set the altitudes and the airspeed of the flight
    * Winning solution doesn’t open in this competition.
    Optimize 4D parameters separately.
    66

    View Slide

  67. 5th Solution (2 of 5)
    Compute the shortest path problem (Dijkstra’s algorithm)
    Vertex:
    the current position
    the destination airport
    the vertices of the restricted zones
    Procedure:
    (1) Create an Initial Routes
    (2) 2D optimization process (latitude and longitude)
    (3) Set the altitudes and the airspeed of the flight
    67
    (Fig. is from [3] Christian Kiss-Toth, Gabor Takacs, “A Dynamic Programming Approach for 4D Flight Route Optimization”)

    View Slide

  68. 5th Solution (3 of 5)
    How to find
    the wind-optimal path?
    Procedure:
    (1) Create an Initial Routes
    (2) 2D optimization process (latitude and longitude)
    (3) Set the altitudes and the airspeed of the flight
    68
    (Fig. is from [3] Christian Kiss-Toth, Gabor Takacs, “A Dynamic Programming Approach for 4D Flight Route Optimization”)

    View Slide

  69. 5th Solution (4 of 5)
    Create a grid in the airspace and
    divides the initial path into N parts.
    ˠ by Dynamic Programming
    Perform it recursively.
    (2) 2D optimization process (latitude and longitude)
    69
    (Fig. is from [3] Christian Kiss-Toth, Gabor Takacs, “A Dynamic Programming Approach for 4D Flight Route Optimization”)

    View Slide

  70. 5th Solution (5 of 5)
    Procedure:
    (1) Create an Initial Routes
    (2) 2D optimization process (latitude and longitude)
    (3) Set the altitudes and the airspeed of the flight
    Optimize two variables: the descending distance and cruise speed.
    For this 1D optimization, the solution used an exhaustive search.
    70

    View Slide

  71. Report from GE
    •  B
    71
    (quote from h;p://www.gereports.com/post/93139010005/underdog-scien)st-cracks-code-to-reduce-flight/)

    View Slide

  72. Summary
    Deep understanding to the objective and evaluation metric is
    important to solve the problem.
    (i.e. taking advantage on the wind is key point.)
    Basic knowledge of Computer Science (DP algorithm) and
    engineering efforts are also helpful for this kind of competitions.
    72

    View Slide

  73. Agenda
    Data Mining Competitions
    Techniques (Tricks) for competitions
    Learning from Winning Solutions
    Trend on Kaggle
    73
    4 - 12
    13 - 28
    29 - 72
    73 - 78

    View Slide

  74. Improved Kaggle Rankings (1/3)
    The old ranking system
    74
    Kaggle users receive points for their performance in competitions.
    On May 2015, Kaggle rolled out an updated version of the ranking
    system.
    Penalty on being
    part of a team
    Popularity of the
    contest
    Decay
    Penalty on being
    part of a team
    Popularity of the
    contest
    Decay
    The new ranking system

    View Slide

  75. Improved Kaggle Rankings (2/3)
    The new formula imposes a smaller penalty on being part of a
    team.
    75
    New Penalty Term
    on being part of a team
    Old Penalty Term

    View Slide

  76. Improved Kaggle Rankings (3/3)
    New point system counts your achievements on past contests.
    76
    New Decay
    Term
    Old Decay
    Term

    View Slide

  77. Forming a Team Seems Active
    On CAT competition, Rank #1 ~ #7 are teams, no solo players.
    Teaming-up is active when ensemble models work well.
    77

    View Slide

  78. Take-away Messages
    Join Kaggle competitions for fun and
    learn techniques from expert data scientist in the world!
    RGF and XGBoost have L2-regularization
    and it might work well for noisy dataset (and ensemble model).
    Ensemble/Blending techniques are tricky.
    Some techniques are impractical in real world setting.
    l%FFQ-FBSOJOH͸΄΅࢖Θͳ͍zͱ͸ࢥ͍·ͤΜɻ࢖͍·͢ɻ
    78

    View Slide

  79. References
    [1] Andreas Töescher and Michael Jahrer “The BigChaos
    Solution to the Netflix Grand Prize”.
    [2] Rie Johnson and Tong Zhang “Learning Nonlinear
    Functions Using Regularized Greedy Forest”, TPAMI’14.
    [3] Christian Kiss-Toth and Gábor Takács, “A Dynamic
    Programming Approach for 4D Flight Route Optimization”,
    Big Data’14.
    [4] Ming Liang and Xiaolin Hu, “Recurrent Convolutional
    Neural Network for Object Recognition”, CVPR’15.
    79

    View Slide