PHPで支える大規模アーキテクチャ

 PHPで支える大規模アーキテクチャ

このスライドはPHPカンファレンス関西2017でおこなったものに、presto追加とシンプルなデモを追加したものです

17d4ef53b432ebf7c566fd6a11345570?s=128

yuuki takezawa

August 05, 2017
Tweet

Transcript

  1. PHPͰࢧ͑Δ େن໛ΞʔΩςΫνϟ ver1.1 takezawa yuuki <ytake> builderscon tokyo 2017

  2. Notice ͜Ε͔Β͓࿩͢͠Δ಺༰͸ɺ ॴҦϏοάσʔλʹରͯ͠ͷ ΞϓϩʔνͰ ҰൠతͳΞϓϦέʔγϣϯʹ͸ ౰ͯ͸·Γ·ͤΜ

  3. Lambda/Kappa Architecture

  4. ҰൠతͳwebαʔϏε • PHP • MySQL, PostgreSQL, Oracle, SQL Server •

    Apache, Nginx etc
  5. ͓΍ʁݕࡧ͕஗͘ͳ͖ͬͯͨͧ Index͸͖ͪΜͱ͋ΔΜ͚ͩͲɾ

  6. େ͖͘ͳ͖ͬͯͨwebαʔϏε • ਺ઍສϨίʔυʹରͯ͠ϑϩϯτ͔ΒLIKE۟ • ύϑΥʔϚϯεվળͷͨΊʹશจݕࡧͳͲΛ௥Ճ͠
 RDBMSͷ೉͍͠ͱ͜ΖΛิ͏ • PVूܭͷͨΊʹຖ෼਺ඦߦ͕ॻ͖ࠐ·ΕΔϩά • খ͍͞αʔϏεͰ͸໰୊ʹͳΒͳ͔ͬͨ΋ͷ͕ݟ͑࢝

    ΊΔ
  7. ύλʔϯ1 • webΞϓϦέʔγϣϯଆͰɺσʔλϕʔεʹૠೖޙɺ
 ElasticseachͳͲʹૠೖ͢Δύλʔϯ • ΞϓϦέʔγϣϯଆͰίϯτϩʔϧͰ͖Δ͕ɺ
 ΞϓϦέʔγϣϯͷίʔυ͕ංେԽ

  8. Application Database Elasticsearch

  9. /** * @Transactional * * @param ProductEntity $entity */ public

    function register(ProductEntity $entity) { $this->productRepository->insert($entity); $this->elasticRepository->index([$entity]); }
  10. ͦͷ2

  11. ύλʔϯ2 • webΞϓϦέʔγϣϯଆͰɺσʔλϕʔεʹૠೖޙɺ
 ఆظ࣮ߦ͞ΕΔόονͰૠೖ͢Δύλʔϯ • webΞϓϦέʔγϣϯଆͰ͸σʔλϕʔεʹૠೖͷΈ • batchͰͲ͜·Ͱ࡞੒͔ͨ֬͠ೝ͠ͳ͕Βɺ
 ະ࡞੒ͷ΋ͷͷΈ࡞੒͢Δ •

    ͨͩ͠ϦΞϧλΠϜͰ͸ͳ͍
  12. Application Database Elasticsearch Batch

  13. ൃੜ͢Δ໰୊ • େྔͷϨίʔυΛऔಘ͢ΔͱςʔϒϧϩοΫ • ϞϊϦγοΫͳγεςϜͰɺಛఆͷσʔλϕʔεʹूத ͨ͠৔߹ʹɺҶͮΔࣜʹো֐ • େྔϨίʔυҰׅ౤ೖͰϨϓϦέʔγϣϯ஗ԆͰো֐ • σΟεΫᷓΕͰো֐

    
 etc…
  14. ύλʔϯ3 • webΞϓϦέʔγϣϯଆͰɺσʔλϕʔεʹૠೖޙɺ
 ProducerΛհͯ͠Message Queue΁ૠೖ • webΞϓϦέʔγϣϯଆͰ͸σʔλϕʔεʹૠೖͷΈ • Consumer͕൓Ԡ͠ɺElasticsearchͷindexΛ࡞੒ •

    Message͕ফࣦ͠ͳ͍ݶΓϦΞϧλΠϜʹ͍ۙ
  15. Application Database Elasticsearch Message Queue Consumer Producer

  16. ൃੜͨ͠໰୊ • ಛఆͷαʔϏε͕ඞཁͱ͢Δ஋ΛDefinitionʹೖΕͯ͠ ·͍ɺσʔλෆ଍ͰQueue٧·Γ • ConsumerͰϝϞϦϦʔΫ • σʔλϕʔείωΫγϣϯΫϩʔζͤͣʹ
 connection is

    gone
  17. ΑΓେ͖ͳΞϓϦέʔγϣϯ΁

  18. ࣄۀ੒௕ʹΑΔେ͖ͳΞϓϦέʔγϣϯ΁ • ϢʔβʔͷߦಈΛ෼ੳ͍ͨ͠ • ଟ͘ͷϢʔβʔʹར༻͞Ε͍ͯΔݕࡧจࣈΛαδΣετ ʹར༻͍ͨ͠ • Ϣʔβʔͷߦಈʹج͍ͮͨίϯςϯπΛද͍ࣔͤͨ͞ • ෼ࢄͨ͠αʔϏεͷσʔλΛू໿ͯ͠৽͍͠ίϯςϯ

    πΛఏڙ͍ͨ͠ • BigData΁
  19. Big Data + Fast Data

  20. BigDataʹ൐͏ΞϓϦέʔγϣϯͷ՝୊ • ͦΕͧΕͷΞϓϦέʔγϣϯͰ࣮ߦ͍ͯͨ͠όονॲ ཧ͕ऴΘΒͳ͍ • Ϩίʔυ਺΋୹͍ظؒͰ਺ԯͱ๲େʹͳΓɺ
 σʔλϕʔεͷindexΑΓ΋I/O͕ݫ͍͠ • ϨϓϦέʔγϣϯ஗Ԇ୲อ͕೉͍͠ •

    ਺ઍສϢʔβʔͷϦΞϧλΠϜͷ෼ੳΛ͢Δʹ͸ݫ͍͠ • ਖ਼نԽͨ͠σʔλ͸ઈରʹRDBMS • ෼ࢄͨ͠σʔλϕʔεʹͲ͏ཱͪ޲͔͏͔
  21. BigData΁ͷΞϓϩʔν • σʔλͦͷ΋ͷͷू໿ • લ೔·Ͱʹूܭ͓͚ͯ͠͹ྑ͍σʔλΛ͋Β͔͡Ί༻ ҙ͢Δ • ϦΞϧλΠϜʹೖྗ͞ΕΔσʔλʹରͯ͠ͷ
 Messageॲཧͱɺ෼ࢄՄೳͳσʔλετϨʔδ •

    લड़ͷdatabase, elasticsearchซ༻ͷऔΓ૊ΈΛɺ
 ΑΓେ͖ͳεέʔϧͰߏங͢Δ
  22. ෼ࢄετϨʔδ࠾༻ • MongoDB΍Couchbase΋ݕ౼
 υϥΠόʔपΓͷෆ҆ఆ͞ͳͲͰݟૹΓ(ݱࡏ΋ར༻த) • Hadoop
 ෼ࢄϑΝΠϧγεςϜͷHDFS
 ෼ࢄॲཧͷͨΊͷMapReduce
 े෼ރΕ͍ͯΔɺ࠾༻ࣄྫ΋े෼

  23. None
  24. ϥϜμΞʔΩςΫνϟ • όον૚ɺαʔϏε૚ɺεϐʔυ૚Ͱߏ੒ • όον૚͸ɺେ͖ͳσʔλͷूܭ΍ɺେྔσʔλͷ෼ੳͳͲΛ୲౰ ͢Δ -> Hadoop(MapReduce), Spark •

    αʔϏε૚͸όον૚ͷू໿݁ՌΛఏڙ͢Δ
 Hive, HBase, ElephantDB, Splout SQL, pipelineDB… • εϐʔυ૚͸ϦΞϧλΠϜॲཧͷ݁ՌΛఏڙ͢Δ૚
 Spark, Storm, Kafka, Cassandra etc.. • αʔϏε૚ͱεϐʔυ૚ͷ྆ํͷ஋ΛϚʔδͯ͠ฦ٫
 ೉қ౓͸ߴ͍ɾɾɾ
 -> KafkaͳͲʹू໿ͤͨ͞Kappa Architecure΁
  25. ετϦʔϜॲཧ • େྔͷσʔλΛϦΞϧλΠϜͰॲཧ͢Δͷ͕ɺ
 ετϦʔϜσʔλॲཧͷ໨త • ऴΘΓ͕ͳ͘ɺແݶʹ΍ͬͯ͘Δ΋ͷ΁ͷΞϓϩʔν • ϝϞϦ಺Ͱॲཧ͞Εɺͦͷޙഁغ͞ΕΔ • ؂ࢹܥͷॲཧΑ͘ར༻͞Ε͍ͯΔ΋ͷ

    • ηϯαʔΛར༻ͨ͠ΞϓϦέʔγϣϯͳͲ
  26. Spark • ෼ࢄॲཧϑϨʔϜϫʔΫͷҰͭ • RDDͱݺ͹ΕΔΠϛϡʔλϒϧͳίϨΫγϣϯΛѻ͏ • Spark SQL • Spark

    Streaming • Spark MLlib
  27. KappaΞʔΩςΫνϟ

  28. KappaΞʔΩςΫνϟ

  29. OSSͰߏங

  30. PHPϝΠϯͰ࡞Δ͜ͱ͸೉͍͠ɾɾ

  31. σϞ https://github.com/ytake/ builderscon-example

  32. Kappa Architecture(small) PHP ConsoleApp Kafka Spark Streaming PHP Consumer Cassandra

    PHP WebApp
  33. Apache Cassandra

  34. Apache Cassandra • Ϩίʔυ͕େྔʹ૿͑Δ͜ͱ͕Θ͔͓ͬͯΓɺ
 ෼ੳʹར༻͢Δ༧ఆͰ͋ͬͨͨΊɺ
 εέʔϧ͕༰қͱ͍͏఺Ͱ࠾༻ • PHP͔Βར༻Մೳ(ext-cassandra) • େྔσʔλͷॻ͖ࠐΈʹରԠ

    • ؆୯ͳτϥϯβΫγϣϯαϙʔτ • σʔληϯλʔލ͗ͷΫϥελʔߏங • Availability ͱ Partition Tolerance • SQLΠϯλʔϑΣʔε • ୯Ұো֐఺ͳ͠???????
  35. Apache Cassandra Architecture

  36. ؾΛ͚ͭΔ఺ • RDBMSײ֮Ͱ͸͏·͘ར༻Ͱ͖·ͤΜ • ύʔςΟγϣϯΩʔͰ͏·͘ઃܭ͢Δ • ৚݅ʹΑΔΦʔμʔ͸ࢦఆͰ͖ͳ͍ • ϚςϦΞϥΠζυϏϡʔซ༻͢΂͠ •

    ো֐࣌ͷϩά෼ੳ͸ͨͩ͘͠ • ίϯύΫγϣϯͱઓ͏(࣮ࡍʹར༻͢Δ༰ྔ*2Ͱܭࢉ) • ݕࡧͰҾ͔͔ͬΔهࣄ͸େମچόʔδϣϯͰɺ
 ݱߦͱશ͘ผ෺
  37. ςʔϒϧઃܭ • Primary Key͸ࣝผΩʔͰ͋Γͳ͕Βɺ
 Ͳͷnodeʹ֨ೲ͢Δ͔Λܾఆ͢ΔύʔςΟγϣϯΩʔ • ҟͳΔnodeʹ͋Δ΋ͷͷݕࡧ͸޲͔ͳ͍
 ඞཁͳέʔε͕ੜͨ͡৔߹͸ςʔϒϧઃܭΛݟ௚͢ • ߋ৽࣌ɺ࡟আ࣌ʹؚΊͳ͚Ε͹ͳΒͳ͍

    • ར༻Մೳͳ΋ͷ͸ηΧϯμϦΠϯσοΫε·Ͱ • JOIN΍LIKE͸ଘࡏ͠ͳ͍ͨΊɺෳࡶͳ΋ͷ͸SparkͰ
  38. ςʔϒϧઃܭ CREATE TABLE timeline.user_timeline ( uuid uuid, user_id int, reference

    map<text, text>, body text, is_read tinyint, published_at timestamp, PRIMARY KEY (user_id) );
  39. Ϩίʔυͷॱ൪Λܾఆ͢Δ CREATE TABLE timeline.user_timeline ( uuid uuid, user_id int, reference

    map<text, text>, body text, is_read tinyint, published_at timestamp, PRIMARY KEY (user_id) ) WITH CLUSTERING ORDER BY (published_at DESC);
  40. MATERIALIZED VIEW CREATE MATERIALIZED VIEW timeline.desc_user_timeline AS SELECT uuid, user_id,

    published_at, reference, body FROM timeline.user_timeline WHERE user_id IS NOT NULL AND published_at IS NOT NULL AND uuid IS NOT NULL PRIMARY KEY (user_id, published_at, uuid) WITH CLUSTERING ORDER BY (published_at DESC);
  41. From PHP $cluster = Cassandra::cluster() ->withContactPoints('10.0.1.24', ‘localhost') ->withPort(9042) ->build(); $statement

    = $session->prepare( "UPDATE users SET age = ? WHERE user_name = ?” ); $futures = array(); // execute all statements in background foreach ($data as $arguments) { $futures[] = $session->executeAsync( $statement, [ ‘arguments' => $arguments ]; }
  42. PHP extension • Batchʹ࠷దԽ͞ΕͨI/F Batch Statement • ฒྻར༻Մೳ • Pagination͕༻ҙ͞Ε͍ͯΔ(Generatorར༻)

    • Cassandraͷ΄ͱΜͲͷػೳ͕ར༻Ͱ͖ΔͷͰɺ
 Java͔Βར༻ͤͣͱ΋े෼׆༻Ͱ͖Δ
  43. Apache Kafka

  44. Apache Kafka • Streamαϙʔτ(ϥϜμΞʔΩςΫνϟͰ͸ඞཁෆՄܽ) • ΫϥελϦϯά͕ࣗ༝ࣗࡏ • Zookeeperͱ࿈ܞͨ͠෼ࢄγεςϜ • ো֐ʹڧ͘ɺϝοηʔδͷ࠶औಘ͕Մೳ

    • SparkͱStormͱ༰қʹ࿈ܞͰ͖Δ͜ͱ͔Β࠾༻ • ϝοηʔδૹ৴ޙͰ΋ࢦఆظؒอ࣋͠ɺ
 ଞͷΫΤϦΤϯδϯ͔Βϝοηʔδ಺༰औಘՄೳ • PHP͔Βར༻Մೳ(rdkafka)
  45. Message QueueͰൃੜ͢Δ໰୊ • Producer͔ΒBroker΁ૹ৴࣌ʹܽଛ͢Δ͜ͱ͕͋Δ • Broker͕ड৴Λࣦഊ͢Δέʔε • Brokerͷૹ৴͕ࣦഊ͢Δέʔε • ॏෳͯ͠ड৴ͯ͠͠·͏έʔε

  46. 0.11 • Exactly-once delivery and transactional messaging • ਖ਼֬ʹҰ౓͚ͩɺ࣮֬ʹಧ͚Δ •

    ϝοηʔδૹ৴ͱड৴ʹτϥϯβΫγϣϯʂ • ΑΓڧݻʹ
  47. Partition • ฒྻ΍෼ࢄॲཧ͕ઃܭ • topicΛPartitionͰ෼ׂ͠ɺProducer, Consumer͕೚ҙ ͷPartitionʹΞΫηε • ࡉ෼Խͱޮ཰Խ͕ࣗ༝ʹ

  48. Partition

  49. BigDataͷ࢝·Γ͸PHP͔Β

  50. Presto

  51. େن໛ʹΑΔσʔλநग़໰୊ • hdfsʹ֨ೲ͞ΕͨσʔλΛΈ͍ͨ • RDBMSʹ֨ೲ͞Εͨσʔλͱ݁߹ͯ͠΄͍͠ • Ϗδωε؍఺Ͱͷσʔλूܭ΍நग़Λͯ͠΄͍͠ • σʔλϕʔε෼ࢄͰ೉қ౓͕ߴ͍ •

    खܰʹ෼ੳʹར༻͍ͨ͠
 
 
 -> ແཧ೉୊
  52. σʔλϚʔτ • ඞཁͳσʔλΛूΊͯσʔλϕʔεʹू໿͢Δ • όονॲཧͰ࣮ߦ͢ΔͨΊɺଈ࣌ʹσʔλΛऔಘ͢Δ ͜ͱ͸೉͍͠ • σʔλϚʔτࣗମͷอकӡ༻͕ඞཁͱͳΔ
 (ͦΕ͕ۀ຿Ͱ͋Ε͹Մೳ) •

    σʔλϚʔτΛઃܭ͢Δͷ͸೉͘͠ɺϋʔυϧ͕ߴ͍
  53. Prestoͱ͸ • facebook੡Ͱେن໛ͳσʔλʹରͯ͠ɺ
 ΠϯλϥΫςΟϒʹσʔλऔಘͰ͖ΔΫΤϦΤϯδϯ
 • ϑϩϯτΞϓϦέʔγϣϯ͔Βhdfsʹ઀ଓ͠ɺ
 σʔλΛଈ࠲ʹՄࢹԽͤ͞Δͷ͸೉͍͠ • Hive͸όονॲཧ༻్ͷͨΊɺ਺ඵͰฦ٫͸ෆՄೳ
 (MapReduce)

    • RDBMSʹ௚઀઀ଓ͍ͨ͠ʂͳͲΛղܾ
  54. Prestoͱ͸ • SQLΠϯλʔϑΣʔεΛఏڙ • Cassandra, Hive, Kafka • MongoDB, MySQL,

    PostgreSQL, SQLServer • Redis, Thrift • ରԠ͍ͯ͠ͳ͍΋ͷͰ΋javaͰυϥΠόΛ࣮૷͢Δ͚ͩ Ͱ͋Δఔ౓֦ுՄೳ • SELECTҎ֎ʹ΋INSERTͳͲʹ΋ରԠ͓ͯ͠Γɺ
 σʔλϕʔεҠߦ΍ɺ෼ࢄΞʔΩςΫνϟͷΧόʔͳͲ ʹ΋
  55. Prestoͱ͸

  56. Prestoͱ͸ • jdbcରԠ • PHP͔Β͸ɺ
 xtendsys-labs/php-presto-client
 ytake/php-presto-client

  57. None
  58. ·ͱΊ • ෳࡶԽ͢ΔΞϓϦέʔγϣϯɺ
 ՝୊ղܾ͸େ͖͘ͳΔ୉ޣຯ • PHPͰϏδωεΛαϙʔτ͢Δ؅ཧπʔϧ • PHP͔Β࢝·ΔBigData + FastDataΞʔΩςΫνϟ

    • PHPͰ΋େ͖͘ߩݙ
  59. webΞϓϦέʔγϣϯ͔Β BigData·Ͱࢧ͑ΔPHP