BigQueryはログ等の集計をSQLで行うことができ、JSONを含むような複雑な集計にも対応していますが、protobufというシリアライゼーションフォーマットには対応していません。BigQueryにはJavaScriptでユーザー定義の関数を作る機能があり、これを使ってprotobuf形式のログを集計する機会があったため紹介しています。
©2019 Wantedly, Inc.Ͱ Λύʔεͨ͠meguro.es #23 @ Wantedly, Inc.BigQueryprotobuf2019-10-10 - Masaki Hara
View Slide
©2019 Wantedly, Inc.ࣗݾհw ݪকݾ !ROJHIZw 8BOUFEMZ1FPQMFόοΫΤϯυw ࠓόοΫΤϯυ Ͱ+4ΛͬͨΛ͠·͢
©2019 Wantedly, Inc.ఆٛ
©2019 Wantedly, Inc.(PPHMF͕ఏڙ͢ΔσʔλΣΞϋεϩάͱ͔ΛόγόγૹΔ͜ͱ͕Ͱ͖Δ42-ޓͷߏจͰരूܭఆٛBigQuery
©2019 Wantedly, Inc.(PPHMF͕։ൃͨ͠γϦΞϥΠθʔγϣϯϑΥʔϚοτͷΑ͏ͳεΩʔϚΛ͋Β͔͡Ί༻ҙ͢ΔεΩʔϚʹ͋ΘͤͯόΠφϦ͕࠷దԽ͞ΕΔ+40/ͱͷ૬ޓมنଇఆٛ͞Ε͍ͯΔఆٛProtobufProtocol Buffers
©2019 Wantedly, Inc.(PPHMF͕ఏڙ͢ΔϝοηʔδΩϡʔҟͳΔαʔϏεؒͰ௨ΛΒࢃ͘ͷ͕͓खܰʹͳΔ͙ʔ͙Δͷ͔ͪΒͬͯ͛͢ʔఆٛ ϢʔβʔσʔλΛཧ͍ͯ͠ΔυΫϩOAuthΛཧ͍ͯ͠ΔίϯηϯτCloud Pub/SubΈ͍ͨͳܗΛͨ͠ΠϧεϢʔβʔσʔλΛݕࡧͰ͖ΔϧʔΫߋ৽௨Google Cloud Pub/Sub
©2019 Wantedly, Inc.ܦҢ
©2019 Wantedly, Inc.ϝοηʔδͷܽམ ͕͋ͬͨʮͨ·ʹ௨͕࡞͞Εͳ͍ʯͱ͍͏ใࠂ͕͋Γௐࠪૹ৴ϩάͱड৴ϩάΛಥ͖߹Θͤͯௐ͍͕ࠪͨ͠ɺड৴ϩά͕͖݅ͭͩͬͨϝοηʔδͷத͕1SPUPCVGͰอଘ͞Ε͍ͯͨܦҢϓϩϑΟʔϧߋ৽Λجʹ௨Λ࡞͢ΔTγϟπϩά (ԁப)ૹ৴ϩάϩά (ԁப)ड৴ϩά
©2019 Wantedly, Inc.ϝοηʔδͷܽམ ܦҢಉ݅͡Λૹ৴ϩά͔ΒධՁ͠ͳ͓ͯ͠ಥ͖߹ΘͤΔͨͩ͠ϩά1SPUPCVGࣗͰύʔαʔΛॻ͜͏
©2019 Wantedly, Inc.໊લ௨Γɺ#JH2VFSZ্Ͱ ϢʔβʔఆٛؔΛ࡞Δ͜ͱ͕Ͱ͖Δ+BWB4DSJQU·ͨ42-ͰఆٛͰ͖ΔUDF User Defined FunctionBigQuery UDF˞(PPHMF$MPVE4UPSBHF্ͷ+BWB4DSJQUϥΠϒϥϦΛಡΈࠐΜͩΓɺ#JH2VFSZͷϓϩδΣΫτ্ʹ6%'ΛӬଓԽ͢Δ͜ͱͰ͖Δɻ ࠓճखܰʹͬͨΓॻ͖͑ͨΓͰ͖ΔͷΛ࡞Γ͔ͨͬͨͷͰɺҰ࣌6%'ͱ࣮ͯͨ͠͠ɻ
©2019 Wantedly, Inc.CREATE TEMP FUNCTION parseMessage(messageB64 BYTES)RETURNS STRINGLANGUAGE js AS """const toSextet = (ch) => {if (0x41 <= ch && ch < 0x41 + 26) {return ch - 0x41;} else if (0x61 <= ch && ch < 0x71 + 26) {return ch - (0x61 - 26);} else if (0x30 <= ch && ch < 0x30 + 10) {return ch + (52 - 0x30);} else if (ch === 0x2B) {return 62;} else if (ch === 0x2F) {return 63;} else {return 0;}};const decodeBase64 = (message) => {const numPadded =message.length < 2 ? 0 :͜Ε͕ͩ©2019 Wantedly, Inc.https://gist.github.com/qnighy/79d5eedbd4cf26a573c2cbd09a4b3956
©2019 Wantedly, Inc.} else {return 0;}};const decodeBase64 = (message) => {const numPadded =message.length < 2 ? 0 :message.charCodeAt(message.length - 2) === 0x3D ? 2 :message.charCodeAt(message.length - 1) === 0x3D ? 1 : 0;const numChunks = (message.length / 4) | 0;const ret = new Array(numChunks * 3 - numPadded);for(let i = 0; i < numChunks; ++i) {const chunkPad = i + 1 == numChunks ? numPadded : 0;const ch0 = toSextet(message.charCodeAt(i * 4));const ch1 = toSextet(message.charCodeAt(i * 4 + 1));const ch2 = toSextet(message.charCodeAt(i * 4 + 2));const ch3 = toSextet(message.charCodeAt(i * 4 + 3));ret[i * 3] = (ch0 << 2) | (ch1 >> 4);if (chunkPad < 2) ret[i * 3 + 1] = ((ch1 & 15) << 4) | (ch2 >> 2);if (chunkPad < 1) ret[i * 3 + 2] = ((ch2 & 3) << 6) | ch3;}return ret;};class Parser {constructor(bytes, index, limit) {this.bytes = bytes;͜Ε͕ͩ©2019 Wantedly, Inc.
©2019 Wantedly, Inc.}return ret;};class Parser {constructor(bytes, index, limit) {this.bytes = bytes;this.index = index;this.limit = limit;}eatByte() {if (this.index >= this.limit) {throw `EOF: ${this.index} + 1 > ${this.limit}`;}return this.bytes[this.index++];}eatBytes(len) {const index = this.index;if (this.index + len > this.limit) {throw `EOF: ${this.index} + ${len} > ${this.limit}`;}this.index += len;return this.bytes.slice(index, index + len);}͜Ε͕ͩ©2019 Wantedly, Inc.
©2019 Wantedly, Inc.return this.bytes.slice(index, index + len);}subParser(len) {const index = this.index;if (index + len > this.limit) {throw `EOF: ${this.index} + ${len} > ${this.limit}`;}this.index += len;return new Parser(this.bytes, index, index + len);}convertToBytes() {const bytes = this.bytes.slice(this.index, this.limit);this.index = this.limit;return bytes;}convertToString() {const bytes = this.convertToBytes();const bytesString = String.fromCharCode(...bytes);return decodeURIComponent(escape(bytesString));}eatVarint() {͜Ε͕ͩ©2019 Wantedly, Inc.
©2019 Wantedly, Inc.return decodeURIComponent(escape(bytesString));}eatVarint() {let shift = 0;let ret = 0;while(true) {const byte = this.eatByte();ret |= (byte & 127) << shift;if ((byte & 128) == 0) {return ret;}shift += 7;}}eatField() {const tag = this.eatVarint();const field = tag >> 3;const wireType = tag & 7;let ret;if (wireType === 0) {ret = this.eatVarint();} else if (wireType === 1) {ret = this.eatBytes(64);͜Ε͕ͩ©2019 Wantedly, Inc.
©2019 Wantedly, Inc.}eatField() {const tag = this.eatVarint();const field = tag >> 3;const wireType = tag & 7;let ret;if (wireType === 0) {ret = this.eatVarint();} else if (wireType === 1) {ret = this.eatBytes(64);} else if (wireType === 2) {ret = this.subParser(this.eatVarint());} else if (wireType === 3) {throw "TODO: wireType == 3";} else if (wireType === 4) {throw "TODO: wireType == 4";} else if (wireType === 5) {ret = this.eatBytes(32);} else {throw "Unknown wire type";}return [field, wireType, ret];}͜Ε͕ͩ©2019 Wantedly, Inc.
©2019 Wantedly, Inc.}return [field, wireType, ret];}parseFieldMask() {const ret = { paths: [] };while (this.index < this.limit) {const [field, wireType, fieldData] = this.eatField();if (field === 1) {ret.paths.push(fieldData.convertToString());}}return ret;}parseWorkingHistory() {const ret = {};while (this.index < this.limit) {const [field, wireType, fieldData] = this.eatField();if (field === 1) {ret._op = ["OPERATION_UNSPECIFIED", "CREATE", "UPDATE", "DELETE"][fieldData] || fieldData;} else if (field === 2) {ret.id = fieldData;}͜Ε͕ͩ©2019 Wantedly, Inc.
©2019 Wantedly, Inc.[fieldData] || fieldData;} else if (field === 2) {ret.id = fieldData;}}return ret;}parseProfile() {const ret = { working_histories: [] };while (this.index < this.limit) {const [field, wireType, fieldData] = this.eatField();if (field === 1) {ret._op = ["OPERATION_UNSPECIFIED", "CREATE", "UPDATE", "DELETE"][fieldData] || fieldData;} else if (field === 2) {ret.id = fieldData;} else if (field === 17) {ret.working_histories.push(fieldData.parseWorkingHistory());}}return ret;}parseUser() {͜Ε͕ͩ©2019 Wantedly, Inc.
©2019 Wantedly, Inc.return ret;}parseUser() {const ret = {};while (this.index < this.limit) {const [field, wireType, fieldData] = this.eatField();if (field === 1) {ret._op = ["OPERATION_UNSPECIFIED", "CREATE", "UPDATE", "DELETE"][fieldData] || fieldData;} else if (field === 2) {ret.id = fieldData;} else if (field === 9) {ret.profile = fieldData.parseProfile();}}return ret;}parseMessage() {const ret = {};while (this.index < this.limit) {const [field, wireType, fieldData] = this.eatField();if (field === 1) {ret.field_mask = fieldData.parseFieldMask();͜Ε͕ͩ©2019 Wantedly, Inc.
©2019 Wantedly, Inc.}return ret;}parseMessage() {const ret = {};while (this.index < this.limit) {const [field, wireType, fieldData] = this.eatField();if (field === 1) {ret.field_mask = fieldData.parseFieldMask();} else if (field === 2) {ret.user = fieldData.parseUser();}}return ret;}}const messageBytes = decodeBase64(messageB64);const parser = new Parser(messageBytes, 0, messageBytes.length);// try {return JSON.stringify(parser.parseMessage());// } catch(e) { return JSON.stringify({ error: e }); }""";CREATE TEMP FUNCTION should_perform(message_json STRING)͜Ε͕ͩ©2019 Wantedly, Inc.
©2019 Wantedly, Inc.}return ret;}}const messageBytes = decodeBase64(messageB64);const parser = new Parser(messageBytes, 0, messageBytes.length);// try {return JSON.stringify(parser.parseMessage());// } catch(e) { return JSON.stringify({ error: e }); }""";CREATE TEMP FUNCTION should_perform(message_json STRING)RETURNS BOOLLANGUAGE js AS """const fetchCreatedWorkingHistories = (user, field_mask) => {if(!field_mask.paths.find((x) => x === "profile.working_histories._op"))return [];return user.profile.working_histories.filter((w) => w._op === "CREATE");};const { user, field_mask } = JSON.parse(message_json);return user.id && fetchCreatedWorkingHistories(user, field_mask).length > 0;""";͜Ε͕ͩ©2019 Wantedly, Inc.
©2019 Wantedly, Inc.ύʔαʔ
©2019 Wantedly, Inc.όΠφϦͷड͚औΓ#JH2VFSZʹ#:5&4ܕ͕͋Δ͕ɺ+BWB4DSJQUʹ͞ΕΔͱ͖#BTFΤϯίʔυ͞ΕͨTUSJOHͱͯ͠දݱ͞ΕΔɻ͡Ίʹ#BTFͷύʔαʔΛॻ͍ͨɻBase64const toSextet = (ch) => {if (0x41 <= ch && ch < 0x41 + 26) {return ch - 0x41;} else if (0x61 <= ch && ch < 0x71 + 26) {return ch - (0x61 - 26);} else if (0x30 <= ch && ch < 0x30 + 10) {return ch + (52 - 0x30);} else if (ch === 0x2B) {return 62;} else if (ch === 0x2F) {return 63;} else {return 0;}};
©2019 Wantedly, Inc.όΠφϦͷड͚औΓ#JH2VFSZʹ#:5&4ܕ͕͋Δ͕ɺ+BWB4DSJQUʹ͞ΕΔͱ͖#BTFΤϯίʔυ͞ΕͨTUSJOHͱͯ͠දݱ͞ΕΔɻ͡Ίʹ#BTFͷύʔαʔΛॻ͍ͨɻBase64const decodeBase64 = (message) => {const numPadded =message.length < 2 ? 0 :message.charCodeAt(message.length - 2) === 0x3D ? 2 :message.charCodeAt(message.length - 1) === 0x3D ? 1 : 0;const numChunks = (message.length / 4) | 0;const ret = new Array(numChunks * 3 - numPadded);for(let i = 0; i < numChunks; ++i) {const chunkPad = i + 1 == numChunks ? numPadded : 0;const ch0 = toSextet(message.charCodeAt(i * 4));const ch1 = toSextet(message.charCodeAt(i * 4 + 1));const ch2 = toSextet(message.charCodeAt(i * 4 + 2));const ch3 = toSextet(message.charCodeAt(i * 4 + 3));ret[i * 3] = (ch0 << 2) | (ch1 >> 4);if (chunkPad < 2) ret[i * 3 + 1] = ((ch1 & 15) << 4) | (ch2 >> 2);if (chunkPad < 1) ret[i * 3 + 2] = ((ch2 & 3) << 6) | ch3;}
©2019 Wantedly, Inc.όΠφϦͷड͚औΓ͋ͱͰௐͨͱ͜Ζɺ6JOU"SSBZར༻Մೳͩͬͨɻͪ͜Βͷ΄͏͕ڥʹ༏ͦ͠͏Base64
©2019 Wantedly, Inc.1SPUPCVGϑΥʔϚοτͷ͏ͪɺεΩʔϚඇґଘͷ෦Λz1SPUPCVG8JSF'PSNBUzͱ͍͏εΩʔϚ͕ͳ͍߹ɺߏମΛஈ֊͚ͩల։Ͱ͖ΔProtobufProtobuf Wire Format
©2019 Wantedly, Inc.8JSF'PSNBUɺʮϑΟʔϧυ൪߸༰ʯͷྻΛදݱ͢Δʮ༰ʯόΠτྻɺCJUɺCJUɺՄมඇෛ WBSJOUͷ͍ͣΕ͔ProtobufProtobuf Wire Format aY&aYaY aY&aYaYaYaY
©2019 Wantedly, Inc.Protobuf aY&aYaY aY&aYaYaYaY08 A4 13 22 03 E3 81 8222 03 E3 81 84 2D 00 0000 00 98 03 02 08 00
©2019 Wantedly, Inc.Protobuf aY&aYaY aY&aYaYaYaYωετͨ͠ϝοηʔδωετͨ͠ϝοηʔδɺಘΒΕͨόΠτྻΛ࠶ؼతʹ8JSF 'PSNBUͱͯ͠ύʔε͢Δ͜ͱͰಘΒΕΔ
©2019 Wantedly, Inc.Protobufωετͨ͠ϝοηʔδclass Parser {constructor(bytes, index, limit) {this.bytes = bytes;this.index = index;this.limit = limit;}eatByte() {if (this.index >= this.limit) {throw `EOF: ${this.index} + 1 > ${this.limit}`;}return this.bytes[this.index++];}}όΠτྻͷ෦ྻΛঢ়ଶ͖ͭͰύʔε͍ͨ͠ͷͰɺCZUFT JOEFY MJNJUͷͭͰύʔαʔͷঢ়ଶΛදݱͨ͠ɻ#JH2VFSZͰΫϥε͕͑Δ͜ͱ͕Θ͔ͬͨɻࠓճͷίʔυͷߏతʹɺߦॲཧ͢Δ͝ͱʹΫϥε͕ճੜ͞Ε͍ͯΔՄೳੑ͕͋Δɻ
©2019 Wantedly, Inc.Protobufωετͨ͠ϝοηʔδclass Parser {subParser(len) {const index = this.index;if (index + len > this.limit) {throw `EOF: ${this.index} + ${len} > ${this.limit}`;}this.index += len;return new Parser(this.bytes, index, index + len);}}͜ͷߏʹ͓ͯ͘͠ͱɺωετͨ͠ϝοηʔδͷͨΊͷύʔαʔ͕؆୯ʹऔΓग़ͤΔɻαϒύʔαʔΛΘͳ͍ͱ͖ͦͷ··ࣺͯΕίετͰଓߦͰ͖Δ
©2019 Wantedly, Inc.Protobuf7BSJOUܗࣜclass Parser {eatVarint() {let shift = 0;let ret = 0;while(true) {const byte = this.eatByte();ret |= (byte & 127) << shift;if ((byte & 128) == 0) {return ret;}shift += 7;}}}QSPUPCVGʹग़ͯ͘Δͷ΄ͱΜͲ͜ͷܗͰΤϯίʔυ͞Ε͍ͯΔɻCJU͝ͱʹ۠ΓϦτϧΤϯσΟΞϯͰฒ͚ͨͩͷ؆୯ͳܗࣜͰɺޙଓͷόΠτ͕͋Δͱ͖CJUΛཱͯΔɻ
©2019 Wantedly, Inc.Protobuf7BSJOUܗࣜ͋ͱͰௐͨΒɺ#JH2VFSZ#JH*OU OͷΑ͏ͳαϙʔτ͍ͯͨ͠ɻࠓճेʹখ͍͔͞͠ͳ͔ͬͨͷͰͲͪΒͰ0,ͩͬͨɻ
©2019 Wantedly, Inc.ProtobufϑΟʔϧυܗࣜ'JFMEº'JFMEº 7BMVF CJU7BMVF WBSJOU'JFMEº 7BMVF CJU'JFMEº 7BMVF CZUFT-FOHUI͜͜WBSJOUܗࣜ
©2019 Wantedly, Inc.εΩʔϚґଘ෦ඞཁͳϝιουΛ1BSTFSΫϥεʹ࣮ͯ͠͏ํࣜʹͨ͠ɻݟΔϑΟʔϧυඞཁͳͷ͚ͩɻXIJMFϧʔϓͰpFME൪߸ʹԠͯ͡ॲཧΛ͢ΔɻProtobufclass Parser {parseProfile() {const ret = { working_histories: [] };while (this.index < this.limit) {const [field, wireType, fieldData] = this.eatField();if (field === 1) {ret._op = ["OPERATION_UNSPECIFIED", "CREATE", "UPDATE", "DELETE"][fieldData] || fieldData;} else if (field === 2) {ret.id = fieldData;} else if (field === 17) {ret.working_histories.push(fieldData.parseWorkingHistory());}}return ret;}}
©2019 Wantedly, Inc. +BWB4DSJQUࠐΈεΫϦϓτͱͯ͠ศར όϕΒͳͯ͘࠷৽ͷ+4͕͑Δڥָ͍͠ 1SPUPCVGύʔαʔׂͱ؆୯ʹॻ͚Δ·ͱΊ͓·͚ʹ#JH2VFSZ42-͕มɾ܁Γฦ͠ɾϝλϨϕϧͷذΛ#FUB൛ͱͯ͠αϙʔτͨ͠ͷͰɺ͜ΕΛͬͨύʔαʔॻ͚Δ͔͠Ε·ͤΜ