Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Securing and Personalizing Commerce Using Identity Data Mining

Securing and Personalizing Commerce Using Identity Data Mining

As we are witnessing our society becoming increasingly more reliant on mobile technology, so are we seeing the mobilization of money. In this new realm of commerce, online identity is becoming significantly more important.

As a payment is processed, it becomes incredibly important to not only understand who a person is, but also to understand what their broader interests and preferences are so that personalized experiences, suggesting new content and merchandise, may be delivered on an individual level.

Jonathan LeBlanc

April 07, 2013
Tweet

More Decks by Jonathan LeBlanc

Other Decks in Technology

Transcript

  1. Using Identity Data Mining Securing & Personalizing Commerce Jonathan LeBlanc

    Developer Evangelist (PayPal) Github: http://github.com/jcleblanc Twitter: @jcleblanc
  2. Premise You can determine the personality profile of a person

    based on their usage habits Personalization == Security  
  3. The Different States of Knowledge What a person knows What

    a person knows they don’t know What a person doesn’t know they don’t know
  4. Our Subject Material HTML content is poorly structured There are

    some pretty bad web practices on the interwebz   You can’t trust that anything semantically valid will be present  
  5. Capture Raw Page Data Semantic data on the web is

    sucktastic Assume 5 year olds built the sites   Language is the key
  6. Extract Keywords We now have a big jumble of words.

    Let’s extract Why is “and” a top word? Stop words = sad panda
  7. Weight Keywords All content is not created equal Meta and

    headers and semantics oh my! This is where we leech off the work of others  
  8. Questions to Keep in Mind Should I use regex to

    parse web content? How do users interact with page content? What key identifiers can be monitored to detect interest?  
  9. Fetching the Data: The Request $html = file_get_contents('URL'); $c =

    curl_init('URL'); The Simple Way   The Controlled Way  
  10. Fetching the Data: cURL $req = curl_init($url); $options = array(

    CURLOPT_URL => $url, CURLOPT_HEADER => $header, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_AUTOREFERER => true, CURLOPT_TIMEOUT => 15, CURLOPT_MAXREDIRS => 10 ); curl_setopt_array($req, $options);
  11. //list of findable / replaceable string characters $find = array('/\r/',

    '/\n/', '/\s\s+/'); $replace = array(' ', ' ', ' '); //perform page content modification $mod_content = preg_replace('#<script(.*?)>(.*?)</ script>#is', '', $page_content); $mod_content = preg_replace('#<style(.*?)>(.*?)</ style>#is', '', $mod_content); $mod_content = strip_tags($mod_content); $mod_content = strtolower($mod_content); $mod_content = preg_replace($find, $replace, $mod_content); $mod_content = trim($mod_content); $mod_content = explode(' ', $mod_content); natcasesort($mod_content);
  12. //set up list of stop words and the final found

    stopped list $common_words = array('a', ..., 'zero'); $searched_words = array(); //extract list of keywords with number of occurrences foreach($mod_content as $word) { $word = trim($word); if (preg_match('/[^a-zA-Z]/', $word) == 1){ $word = ''; } if(strlen($word) > 2 && !in_array($word, $common_words)){ $searched_words[$word]++; } } arsort($searched_words, SORT_NUMERIC);
  13. Scraping Site Meta Data //load scraped page data as a

    valid DOM document $dom = new DOMDocument(); @$dom->loadHTML($page_content); //scrape title $title = $dom->getElementsByTagName("title"); $title = $title->item(0)->nodeValue;
  14. //loop through all found meta tags $metas = $dom->getElementsByTagName("meta"); for

    ($i = 0; $i < $metas->length; $i++){ $meta = $metas->item($i); if($meta->getAttribute("property")){ if ($meta->getAttribute("property") == "og:description"){ $dataReturn["description"] = $meta->getAttribute("content"); } } else { if($meta->getAttribute("name") == "description"){ $dataReturn["description"] = $meta->getAttribute("content"); } else if($meta->getAttribute("name") == "keywords”){ $dataReturn[”keywords"] = $meta->getAttribute("content"); } } }
  15. Weighting Important Data Tags you should care about: meta (include

    OG), title, description, h1+, header Bonus points for adding in content location modifiers
  16. Weighting Important Tags //our keyword weights $weights = array("keywords" =>

    "3.0", "meta" => "2.0", "header1" => "1.5", "header2" => "1.2"); //add modifier here if(strlen($word) > 2 && !in_array($word, $common_words)){ $searched_words[$word]++; }
  17. Expanding to Phrases 2-3 adjacent words, making up a direct

    relevant callout Seems easy right? Just like single words Language gets wonky without stop words
  18. Working with Unknown Users The majority of users won’t be

    immediately targetable Use HTML5 LocalStorage & Cookie backup
  19. Adding in Time Interactions Interaction with a site does not

    necessarily mean interest in it Time needs to also include an interaction component Gift buying seasons see interest variations