Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nested JSON into Google BigQuery by Fluentd

Nested JSON into Google BigQuery by Fluentd

Aef6993d8467fa70b4d6ae5152e37bde?s=128

Takeshi Yako

March 16, 2015
Tweet

Transcript

  1. Nested JSON into Google BigQuery by Fluentd Takeshi Yako

  2. { "age": 22, "children": [ { "age": "6", "gender": "Female",

    "name": "Jane" }, { "age": "15", "gender": "Male", "name": "John" } ], "citiesLived": [ { "place": "Seattle", "yearsLived": [ "1995" ] }, { "place": "Stockholm", "yearsLived": [ "2005" ] } ], "fullName": "John Doe", "gender": "Male", "kind": "person", "phoneNumber": { "areaCode": "206", "number": "1234567" } } Google BigQuery
  3. Schema ------------------------------------ |- kind: string |- fullName: string (required) |-

    age: integer |- gender: string +- phoneNumber: record | |- areaCode: integer | |- number: integer +- children: record (repeated) | |- name: string | |- gender: string | |- age: integer +- citiesLived: record (repeated) | |- place: string | |- yearsLived: integer (repeated)
  4. None
  5. FLATTEN SELECT fullName, age FROM [my_dataset.my_table] WHERE (citiesLived.yearsLived > 1995

    ) AND (children.age > 3) BigQuery error in query operation: Error processing job 'xxxx:bqjob_xxxxx': Cannot query the cross product of repeated fields children.age and citiesLived.yearsLived.
  6. FLATTEN SELECT fullName, age, gender, citiesLived.place FROM (FLATTEN([my_dataset.my_table], children)) WHERE

    (citiesLived.yearsLived > 1995) AND (children.age > 3) GROUP BY fullName, age, gender, citiesLived.place +------------+-----+--------+-------------------+ | fullName | age | gender | citiesLived_place | +------------+-----+--------+-------------------+ | John Doe | 22 | Male | Stockholm | | Mike Jones | 35 | Male | Los Angeles | | Mike Jones | 35 | Male | Washington DC | | Mike Jones | 35 | Male | Portland | | Mike Jones | 35 | Male | Austin | +------------+-----+--------+-------------------+
  7. WITHIN RECORD SELECT fullName, COUNT(children.name) WITHIN RECORD AS numberOfChildren FROM

    [my_dataset.my_table] +---------------+------------------+ | fullName | numberOfChildren | +---------------+------------------+ | Anna Karenina | 0 | | John Doe | 2 | | Mike Jones | 3 | +---------------+------------------+
  8. WITHIN node_name SELECT fullName, COUNT(citiesLived.place) WITHIN RECORD AS numberOfPlacesLived, citiesLived.place,

    COUNT(citiesLived.yearsLived) WITHIN citiesLived AS numberOfTimesInEachCity, FROM [my_dataset.my_table] +---------------+---------------------+-------------------+-------------------------+ | fullName | numberOfPlacesLived | citiesLived_place | numberOfTimesInEachCity | +---------------+---------------------+-------------------+-------------------------+ | Anna Karenina | 3 | Stockholm | 4 | | Anna Karenina | 3 | Russia | 3 | | Anna Karenina | 3 | Austin | 2 | | John Doe | 2 | Seattle | 1 | | John Doe | 2 | Stockholm | 1 | | Mike Jones | 4 | Los Angeles | 4 | | Mike Jones | 4 | Washington DC | 4 | | Mike Jones | 4 | Portland | 4 | | Mike Jones | 4 | Austin | 4 | +---------------+---------------------+-------------------+-------------------------+