Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BigData PHP

BigData PHP

34be88398f623c109b61d23e8215bd23?s=128

Mariusz Gil

October 27, 2013
Tweet

Transcript

  1. bigdata.php Mariusz Gil

  2. PHP / Scalability and performance / Big Data

  3. PHP and memcached, advanced use-cases / PHPCon PL 2010 Aspect

    oriented programming in PHP / PHPCon PL 2010 3
  4. 3V Volume, Velocity, Variety

  5. None
  6. None
  7. 95% of knowledge you already have

  8. 2004

  9. None
  10. (K1, V1) → list(K2, V2) map step (K2, list(V2)) →

    list(K3, V3) reduce step
  11. php is good php is simply php is popular php

    is good php is simply php is popular php, 1 is, 1 simply, 1 php, 1 is, 1 popular, 1 php, 1 is, 1 good, 1 php, 3 is, 3 good, 1 simply, 1 popular, 1 good, 1 is, 1 is, 1 is, 1 php, 1 php, 1 php, 1 simply, 1 popular, 1 php, 3 is, 3 good, 1 simply, 1 popular, 1
  12. HDFS + YARN + MapReduce

  13. But with support for external programs by Streaming API Java

    oriented
  14. NodeManager YARNChild MapTask ReduceTask node manager node <?php while (($line

    = fgets(STDIN)) !== false) { $words = explode(' ', trim($line)); foreach ($words as $word) { echo $word . ' ' . 1 . PHP_EOL; } }
  15. MongoDB

  16. $mongo = new MongoClient(); $app['mongo'] = $mongo->selectDB('db'); $map = new

    MongoCode('function() { emit(this.key, this.value); }'); $reduce = new MongoCode('function(key, values) { return Array.sum(values); }'); $result = $app['mongo']->command(array( 'mapreduce' => 'collection', 'map' => $map, 'reduce' => $reduce, 'out' => array( 'inline' => 1, ), ));
  17. Apache Zookeeper Apache HBase Apache Hive Apache Oozie Apache Pig

    Apache Avro Apache Ambari Apache Chukwa Apache Flume Apache Scribe Apache Whirr Apache Mahout Apache Sqoop Apache Zookeeper Apache HBase Apache Hive Apache Pig Apache Avro
  18. None
  19. region servers HDFS nodes php

  20. $socket = new TSocket('localhost', 9090); $socket->setSendTimeout(2000); $socket->setRecvTimeout(4000); $transport = new

    TBufferedTransport($socket); $protocol = new TBinaryProtocol($transport); $client = new HbaseClient($protocol); $transport->open(); $table = 'test'; $descriptors = $client->getColumnDescriptors($table); $result = $client->getRow($table, "php"); foreach ($descriptors as $col) { echo ("Column: {$col->name}, maxVer: {$col->maxVersions}" . PHP_EOL); } $transport->close();
  21. None
  22. CREATE TABLE page_views ( user_id INT, page_id, date DATE, user_agent

    STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; SELECT page_views.* FROM page_views WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31' SELECT page_views.* FROM page_views JOIN dim_users ON (page_views.user_id = dim_users.id) WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31' SELECT col1 FROM t1 GROUP BY col1 HAVING SUM(col2) > 10
  23. CREATE TABLE www_logs ( ip STRING, method STRING, url STRING,

    http_code SMALLINT, referrer STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; add FILE www_logs_mapper.php; INSERT OVERWRITE TABLE www_logs_raw SELECT TRANSFORM (line) USING 'php www_logs_mapper.php' AS (ip, method, url, http_code, referrer) FROM www_logs; SELECT user_agent, COUNT(*) FROM www_logs GROUP BY user_agent; CREATE TABLE www_logs_raw ( line STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
  24. mapinka-reducinka.pl

  25. None
  26. None
  27. Storm Free and open source Distributed system Realtime processing Language

    agnostic
  28. TextSpout SplitSentenceBolt WordCountBolt [sentence] [word] [word, count] TextSpout SplitSentenceBolt [sentence]

    xyzBolt php php php php php php
  29. class RandomSentenceSpout extends ShellSpout { protected $sentences = array( "the

    cow jumped over the moon", "an apple a day keeps the doctor away", ); protected function nextTuple() { sleep(.1); $sentence = $this->sentences[ rand(0, count($this->sentences) - 1)]; $this->emit(array($sentence)); } protected function ack($tuple_id) { return; } protected function fail($tuple_id) { return; } } $SentenceSpout = new RandomSentenceSpout(); $SentenceSpout->run();
  30. class SplitSentenceBolt extends BasicBolt { public function process(Tuple $tuple) {

    $words = explode(" ", $tuple->values[0]); foreach($words as $word) { $this->emit(array($word)); } } } $splitsentence = new SplitSentenceBolt(); $splitsentence->run();
  31. None
  32. None
  33. http://hadoop.apache.org/ http://hive.apache.org/ http://hbase.apache.org/ http://mahout.apache.org/ http://zookeeper.apache.org/ http://www.mongodb.org/ http://storm-project.net/ http://incubator.apache.org/drill/ http://www.bigdatafestival.co/

  34. None
  35. None
  36. THANKS! joind.in/9775 @mariuszgil