Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Workshop BBVA - Open Innovation by MARISOL MENENDEZ and ESTEBAN MORO EGIDO at Big Data Spain 2013

Workshop BBVA - Open Innovation by MARISOL MENENDEZ and ESTEBAN MORO EGIDO at Big Data Spain 2013

BBVA provides a dataset with dissociated card transaction prior registration to their developers platform. The deadline for contestants end on december 3rd. Big Data Spain hosts a workshop to support BBVA Open Innovation's data challenge.
Session presented at Big Data Spain 2013 Conference
7th Nov 2013
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/2013/conference/workshop-bbva-open-innovation

Big Data Spain

November 14, 2013
Tweet

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. INNOVA CHALLENGE Workshop 30 Oc Maps Activity Infrastructures /Places Analysis

    Models App Content Visualization Analytics and Models Challenge participant “roadmap” Data Mining Development
  2. INNOVA CHALLENGE Workshop 30 Oc Introduction to geo-tagged data Access

    to (open) geo-tagged data Example: development of geolocalized recommender app. Summary
  3. INNOVA CHALLENGE Workshop 30 Oc Introduction to geo-tagged data Informatio

    n: Person, event, infrastructur e. Geography: GPS coordinates, zone, city
  4. INNOVA CHALLENGE Workshop 30 Oc Geospatial BigData Social Media Sensors

    Satellite Images Maps Activity (Transport) Geospatial Bigdata
  5. INNOVA CHALLENGE Workshop 30 Oc With geo-tagged data we can

    Measure zone/area occupation & activity Identify flows of persons/money between different areas … With those data we can build applications in Geo-social analysis Geomarketing Optimal allocation of resources Fraud detection Event detection … Geo-tagged BigData applications
  6. INNOVA CHALLENGE Workshop 30 Oc Use of pervasive sensors (mobile

    phones, social media) to model movement and communication of people in urban areas. Geo-social Analysis
  7. INNOVA CHALLENGE Workshop 30 Oc !! Estudio de geolocalización en

    Madrid ! 34! L ocalización:! ! Puerta! del! S ol! ! Número!de!checkins!totales:!2651!(30.5!al!día)! Número!de!usuarios!únicos!en!la!zona:!1231! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! hora count 0 100 200 300 400 500 600 700 0 5 10 15 20 25 fa c to r (tip o ) arts_entertainment food nightlife shops dia count 0 500 1000 1500 lunes martes miércoles jueves viernes sábado domingo fa c to r (tip o ) arts_entertainment food nightlife shops timedays count 0 50 100 150 abr−11 may−11 jun−11 fa c to r(tip o 0 ) arts_entertainment food nightlife shops 1 2 3 4 5 6 7 8 9 1 0 p la c e fnac starbucks coffee mercado de san miguel el corte inglés mercado de san antón yelmo cines ideal 3d vips mcdonald's café de oriente sala joy eslava n _ c h e c k in s 316 269 251 136 113 87 84 78 77 71 1 2 3 4 5 6 7 8 9 1 0 u s e r amazel666 runway4 edaindil maestrodarius ivo_campos despop edumaiza dalogu8 desdealbert0 mmetafetan n _ c h e c k in s 121 73 40 39 35 33 33 32 32 30 Characterizat ion of urban neighborhood s according to their social/comme rcial use Geo-social analysis
  8. INNOVA CHALLENGE Workshop 30 Oc Use merchant localization and/or IP

    address in online transactions to detect fraud. Fraud detection
  9. INNOVA CHALLENGE Workshop 30 Oc Bares Tiendas Identify best placement

    for a new shop/branch Optimize cash holding in bank branches, minimizing costs associated with it. Optimal resource allocation
  10. INNOVA CHALLENGE Workshop 30 Oc Types of data Maps Economic/Demographic

    data Other type of data Google’s POIs Weather forecast Activity Twitter BBVA API
  11. INNOVA CHALLENGE Workshop 30 Oc Maps:: Google Maps Google Maps

    has a number of different services/APIs, with different restrictions and protocols. It allows to define maps, routes, markers, etc. Example: get a static map (without authentication). URL Base: http://maps.google.com/maps/api/staticmap Parameters: • center: 40.4153,-3.6875 • size: 640x640 • maptype: mobile • format: png32 • sensor: true
  12. INNOVA CHALLENGE Workshop 30 Oc Maps :: OpenStreetMap Open and

    collaborative project to create and distribute free maps. Different APIs to get information about routes, points, maps, etc. There are a number of Mapping projects (applications) build on top of OSM with very different purposes Example: get the route between two locations. MapQuest. URL Base: http://open.mapquestapi.com/guidance/v1/ Parameters: • Key: authentication key • From: latitud y longitud del origen en JSON. • To: latitud y longitud del destino en JSON.
  13. INNOVA CHALLENGE Workshop 30 Oc ospatial vector data format for

    geographical information egions, points, paths defined as points, lines, polygons ach of them usually has attributes that describe it Region Codes, Names, Population, etc. http://www.naturalearthdata.com/downloads/ pyshp: http://code.google.com/p/pyshp/ maptools: http://cran.r-project.org/web/packages/maptools Mapas :: shapefiles
  14. INNOVA CHALLENGE Workshop 30 Oc CartoCiudad (Ministerio de Fomento): shapefiles

    for each province at municipality and postal code levels. They also include data about the urban background http://www.cartociudad.es/portal/ Maps :: Spain cartography
  15. INNOVA CHALLENGE Workshop 30 Oc Nomecalles (CAM): shapefiles, POIs (museums,

    theaters, health services ), subway (stations), etc. http://www.madrid.org/nomecalles/DescargaBDTCorte.icm Resolution level: municipalities, districts, postal codes, etc. Maps :: Madrid cartography
  16. INNOVA CHALLENGE Workshop 30 Oc Plan territorial metropolitano de Barcelona

    – Generalitat de Catalunya Link Maps :: Barcelona province cartography
  17. INNOVA CHALLENGE Workshop 30 Oc Plan territorial metropolitano de Barcelona

    – Generalitat de Catalunya Link This web has also data about mobility, economic development, population, etc. at the district level There is nothing at this level of detail in Madrid. Solution: Use other data sources to estimate them (see below). Maps :: Barcelona city cartography
  18. INNOVA CHALLENGE Workshop 30 Oc Demographic/Economic data :: Spain Demographic

    Data: Instituto Nacional de Estadística (INE) Census by provinces / municipality / census section. Link Economic Data: Servicio Público de Empleo Estatal (SEPE). Unemployment by municipality. Link
  19. INNOVA CHALLENGE Workshop 30 Oc Demographic/Economic data :: Madrid Madrid

    City Madrid City Council database: http://www-2.munimadrid.es/CSE6/jsps/menuBancoDatos.jsp Population by districts, neighborhoods, etc. Madrid Region Comunidad de Madrid database: http://www.madrid.org/desvan/Inicio.icm?enlace=almudena Population by municipality. Economical data by municipality
  20. INNOVA CHALLENGE Workshop 30 Oc Demographic/Economic data :: Barcelona Barcelona

    city Departament d’Estadística http://www.bcn.cat/estadistica/castella/ Population by district. Unemployment by district. Catalonia region Idescat (Institut d’Estadística de Catalunya) http://www.idescat.cat/es/ Population by municipality Economical data by municipality.
  21. INNOVA CHALLENGE Workshop 30 Oc Points of interest around Puerta

    del Sol (Madrid) Service 1: Places Search Parameters : location: 40.417, -3.703 radius: 1000 Service 2: Places Details parameters: reference: place code Other data sources :: Google Points of Interest
  22. INNOVA CHALLENGE Workshop 30 Oc GFS: Global Forecast System OpeNDAP

    protocol. Python implementation : pydap Query format: SERVER = http://nomads.ncep.noaa.gov:9090/dods/gfs_hd/ DATE = AAAAMMDD HOUR = HH VAR = weather metric r (tmp2m, ugrd10m, pressfc, …) LAT = latitude interval [259:263] (0.5º steps from South Pole) LON = longitude interval [710:714] (0.5º steps from Greenwich) QUERY = SERVERgfs_hdDATE/gfs_hd_HOURz.dods?VAR[0:0][LAT][LON] dataset = open_dods(QUERY) Other data sources :: Weather forecast
  23. INNOVA CHALLENGE Workshop 30 Oc Developers webpage http://dev.twitter.com Consumer Key

    Consumer Secret Access token Access token secret Activity :: data from Twitter API
  24. INNOVA CHALLENGE Workshop 30 Oc Consumer Key Consumer Secret Access

    token Access token secret OAuth Authentication Rest API Stream API Several queries with parameters Number of requests is limited Only one query (with parameters) Requests are not time- limited Activity :: data from Twitter API
  25. INNOVA CHALLENGE Workshop 30 Oc Stream API Example: Geolocalized Tweets

    in the Madrid region API Service: POST statuses/filter parameters: locations: -4.59, 39.90, -3.04, 41.17 Activity :: data from Twitter API
  26. INNOVA CHALLENGE Workshop 30 Oc As we said before, there

    are no data in Madrid about administrative zones below the municipality. But we can estimate some of the with Twitter • Example: population by postal codes 1. Round geographical coordinates to the 3rd decimal place (square cells of approx. 100 meters squared). 2. Analyze the most visited postal code by user. Define that as his/her residence. Count number of residents by postal code 3. Visualize. Stream API Activity :: data from Twitter API
  27. INNOVA CHALLENGE Workshop 30 Oc Getting the authentication data: Example:

    APP_ID = "iic_formacion_innovachallenge" APP_KEY = "0f1d750a5baea6c7022452d0d2ece01fc5901ad7” str_to_encode="iic_formacion_innovachallenge:0f1d750a5baea6c7022452d0d2e ce01fc5901ad7” auth = strToBase64(str_to_encode) Request = HttpRequest(SERVICE, PARAMETERS, header = {‘Authorization’ : auth}) 1. With the APP_ID and APP_KEY, generate the authorization code concatenating both strings with and codifying it to base64. 2. This authorization code is added to the Http Request Header. Activity :: data from BBVA API
  28. INNOVA CHALLENGE Workshop 30 Oc Building the adjacency list Activity

    :: CUSTOMER_ZIPCODES example Workshop 30thOctob
  29. INNOVA CHALLENGE Workshop 30 Oc Building and plotting the graph

    Activity :: CUSTOMER_ZIPCODES example Workshop 30thOctob
  30. INNOVA CHALLENGE Workshop 30 Oc Economical flows from Puerta del

    Sol Servicio API: customer_zipcodes Parámetros: date_min:201304 date_max:201304 zipcode:28013 by:cards group_by:month Activity :: CUSTOMER_ZIPCODES example
  31. INNOVA CHALLENGE Workshop 30 Oc Objective: recommend users what areas

    to visit according to their profile, residence, preferences, etc. Using information about what similar users do. Data used: 1. Twitter data. 2. API Innova Challenge – CARDS_CUBE. 3. API Innova Challenge – CUSTOMER_ZIPCODES. Recommender systems :: Introduction
  32. INNOVA CHALLENGE Workshop 30 Oc Use twitter data to 1.

    Get what people are talking about in city areas. 2. Analyze user language in Twitter 3. Compare user language with area language and recommend user most similar areas. Recommender systems :: user language
  33. INNOVA CHALLENGE Workshop 30 Oc ? ? ? ? ?

    ? ? ? Use CARDS_CUBE service from the BBVA API Recommender systems :: user demographic profile
  34. INNOVA CHALLENGE Workshop 30 Oc • Use CARDS_CUBE service data

    • For each merchant category Z (bars, fashion, health, etc.) build a matrix in which each entry is the number of different credit cards for a given profile X (gender, age) that went shopping to the postal code Y in a merchant of category Z. Where do people like me go shopping? Which restaurants are visited by people similar to me? Recommender systems :: user demographic profile
  35. INNOVA CHALLENGE Workshop 30 Oc Example: Male, age 36-45 Fashio

    n Bars and restaurants Recommender systems :: user demographic profile
  36. INNOVA CHALLENGE Workshop 30 Oc Use CUSTOMER_ZIPCODES service in the

    BBVA API ?? ? ? Recommender systems :: user geographic profile
  37. INNOVA CHALLENGE Workshop 30 Oc • Use data from the

    CUSTOMER_ZIPCODES service • For each merchant category Z (bars, fashion, health, etc.) we build a matrix in which each entry is the number of different credit cards from a postal code X that go shopping to postal code Y in merchant category Z. Where do people in my district go shopping? What restaurants are visited by people living in my district? Recommender systems :: user geographic profile
  38. INNOVA CHALLENGE Workshop 30 Oc Fashio n Bars and restaurants

    Example: postal code 28045 Recommender systems :: user geographic profile
  39. INNOVA CHALLENGE Workshop 30 Oc Fashio n Bars and restaurants

    Example: Male, age 36-45, living in postal code 28045. Recommender systems :: combination
  40. INNOVA CHALLENGE Workshop 30 Oc From data to the app

    1. The idea. 2. What data do I need to carry out this idea? Which services of the Challenge API do I need? May I improve it with other information sources? 3. Analysis: distilling the idea and assessing its viability. Extracting the hidden value of analytics and models. 4. How can the user take advantage of this idea? 5. Iterate 2,3 and 4 until the idea and the user profit show up. 6. Convert the value of the analysis to an application.