Slide 1

Slide 1 text

bigdata.php Mariusz Gil

Slide 2

Slide 2 text

PHP / Scalability and performance / Big Data

Slide 3

Slide 3 text

PHP and memcached, advanced use-cases / PHPCon PL 2010 Aspect oriented programming in PHP / PHPCon PL 2010 3

Slide 4

Slide 4 text

3V Volume, Velocity, Variety

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

95% of knowledge you already have

Slide 8

Slide 8 text

2004

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

(K1, V1) → list(K2, V2) map step (K2, list(V2)) → list(K3, V3) reduce step

Slide 11

Slide 11 text

php is good php is simply php is popular php is good php is simply php is popular php, 1 is, 1 simply, 1 php, 1 is, 1 popular, 1 php, 1 is, 1 good, 1 php, 3 is, 3 good, 1 simply, 1 popular, 1 good, 1 is, 1 is, 1 is, 1 php, 1 php, 1 php, 1 simply, 1 popular, 1 php, 3 is, 3 good, 1 simply, 1 popular, 1

Slide 12

Slide 12 text

HDFS + YARN + MapReduce

Slide 13

Slide 13 text

But with support for external programs by Streaming API Java oriented

Slide 14

Slide 14 text

NodeManager YARNChild MapTask ReduceTask node manager node

Slide 15

Slide 15 text

MongoDB

Slide 16

Slide 16 text

$mongo = new MongoClient(); $app['mongo'] = $mongo->selectDB('db'); $map = new MongoCode('function() { emit(this.key, this.value); }'); $reduce = new MongoCode('function(key, values) { return Array.sum(values); }'); $result = $app['mongo']->command(array( 'mapreduce' => 'collection', 'map' => $map, 'reduce' => $reduce, 'out' => array( 'inline' => 1, ), ));

Slide 17

Slide 17 text

Apache Zookeeper Apache HBase Apache Hive Apache Oozie Apache Pig Apache Avro Apache Ambari Apache Chukwa Apache Flume Apache Scribe Apache Whirr Apache Mahout Apache Sqoop Apache Zookeeper Apache HBase Apache Hive Apache Pig Apache Avro

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

region servers HDFS nodes php

Slide 20

Slide 20 text

$socket = new TSocket('localhost', 9090); $socket->setSendTimeout(2000); $socket->setRecvTimeout(4000); $transport = new TBufferedTransport($socket); $protocol = new TBinaryProtocol($transport); $client = new HbaseClient($protocol); $transport->open(); $table = 'test'; $descriptors = $client->getColumnDescriptors($table); $result = $client->getRow($table, "php"); foreach ($descriptors as $col) { echo ("Column: {$col->name}, maxVer: {$col->maxVersions}" . PHP_EOL); } $transport->close();

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

CREATE TABLE page_views ( user_id INT, page_id, date DATE, user_agent STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; SELECT page_views.* FROM page_views WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31' SELECT page_views.* FROM page_views JOIN dim_users ON (page_views.user_id = dim_users.id) WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31' SELECT col1 FROM t1 GROUP BY col1 HAVING SUM(col2) > 10

Slide 23

Slide 23 text

CREATE TABLE www_logs ( ip STRING, method STRING, url STRING, http_code SMALLINT, referrer STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; add FILE www_logs_mapper.php; INSERT OVERWRITE TABLE www_logs_raw SELECT TRANSFORM (line) USING 'php www_logs_mapper.php' AS (ip, method, url, http_code, referrer) FROM www_logs; SELECT user_agent, COUNT(*) FROM www_logs GROUP BY user_agent; CREATE TABLE www_logs_raw ( line STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

Slide 24

Slide 24 text

mapinka-reducinka.pl

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Storm Free and open source Distributed system Realtime processing Language agnostic

Slide 28

Slide 28 text

TextSpout SplitSentenceBolt WordCountBolt [sentence] [word] [word, count] TextSpout SplitSentenceBolt [sentence] xyzBolt php php php php php php

Slide 29

Slide 29 text

class RandomSentenceSpout extends ShellSpout { protected $sentences = array( "the cow jumped over the moon", "an apple a day keeps the doctor away", ); protected function nextTuple() { sleep(.1); $sentence = $this->sentences[ rand(0, count($this->sentences) - 1)]; $this->emit(array($sentence)); } protected function ack($tuple_id) { return; } protected function fail($tuple_id) { return; } } $SentenceSpout = new RandomSentenceSpout(); $SentenceSpout->run();

Slide 30

Slide 30 text

class SplitSentenceBolt extends BasicBolt { public function process(Tuple $tuple) { $words = explode(" ", $tuple->values[0]); foreach($words as $word) { $this->emit(array($word)); } } } $splitsentence = new SplitSentenceBolt(); $splitsentence->run();

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

http://hadoop.apache.org/ http://hive.apache.org/ http://hbase.apache.org/ http://mahout.apache.org/ http://zookeeper.apache.org/ http://www.mongodb.org/ http://storm-project.net/ http://incubator.apache.org/drill/ http://www.bigdatafestival.co/

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

THANKS! joind.in/9775 @mariuszgil