Slide 1

Slide 1 text

Nested JSON into Google BigQuery by Fluentd Takeshi Yako

Slide 2

Slide 2 text

{ "age": 22, "children": [ { "age": "6", "gender": "Female", "name": "Jane" }, { "age": "15", "gender": "Male", "name": "John" } ], "citiesLived": [ { "place": "Seattle", "yearsLived": [ "1995" ] }, { "place": "Stockholm", "yearsLived": [ "2005" ] } ], "fullName": "John Doe", "gender": "Male", "kind": "person", "phoneNumber": { "areaCode": "206", "number": "1234567" } } Google BigQuery

Slide 3

Slide 3 text

Schema ------------------------------------ |- kind: string |- fullName: string (required) |- age: integer |- gender: string +- phoneNumber: record | |- areaCode: integer | |- number: integer +- children: record (repeated) | |- name: string | |- gender: string | |- age: integer +- citiesLived: record (repeated) | |- place: string | |- yearsLived: integer (repeated)

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

FLATTEN SELECT fullName, age FROM [my_dataset.my_table] WHERE (citiesLived.yearsLived > 1995 ) AND (children.age > 3) BigQuery error in query operation: Error processing job 'xxxx:bqjob_xxxxx': Cannot query the cross product of repeated fields children.age and citiesLived.yearsLived.

Slide 6

Slide 6 text

FLATTEN SELECT fullName, age, gender, citiesLived.place FROM (FLATTEN([my_dataset.my_table], children)) WHERE (citiesLived.yearsLived > 1995) AND (children.age > 3) GROUP BY fullName, age, gender, citiesLived.place +------------+-----+--------+-------------------+ | fullName | age | gender | citiesLived_place | +------------+-----+--------+-------------------+ | John Doe | 22 | Male | Stockholm | | Mike Jones | 35 | Male | Los Angeles | | Mike Jones | 35 | Male | Washington DC | | Mike Jones | 35 | Male | Portland | | Mike Jones | 35 | Male | Austin | +------------+-----+--------+-------------------+

Slide 7

Slide 7 text

WITHIN RECORD SELECT fullName, COUNT(children.name) WITHIN RECORD AS numberOfChildren FROM [my_dataset.my_table] +---------------+------------------+ | fullName | numberOfChildren | +---------------+------------------+ | Anna Karenina | 0 | | John Doe | 2 | | Mike Jones | 3 | +---------------+------------------+

Slide 8

Slide 8 text

WITHIN node_name SELECT fullName, COUNT(citiesLived.place) WITHIN RECORD AS numberOfPlacesLived, citiesLived.place, COUNT(citiesLived.yearsLived) WITHIN citiesLived AS numberOfTimesInEachCity, FROM [my_dataset.my_table] +---------------+---------------------+-------------------+-------------------------+ | fullName | numberOfPlacesLived | citiesLived_place | numberOfTimesInEachCity | +---------------+---------------------+-------------------+-------------------------+ | Anna Karenina | 3 | Stockholm | 4 | | Anna Karenina | 3 | Russia | 3 | | Anna Karenina | 3 | Austin | 2 | | John Doe | 2 | Seattle | 1 | | John Doe | 2 | Stockholm | 1 | | Mike Jones | 4 | Los Angeles | 4 | | Mike Jones | 4 | Washington DC | 4 | | Mike Jones | 4 | Portland | 4 | | Mike Jones | 4 | Austin | 4 | +---------------+---------------------+-------------------+-------------------------+