Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nested JSON into Google BigQuery by Fluentd

Nested JSON into Google BigQuery by Fluentd

Takeshi Yako

March 16, 2015
Tweet

More Decks by Takeshi Yako

Other Decks in Technology

Transcript

  1. { "age": 22, "children": [ { "age": "6", "gender": "Female",

    "name": "Jane" }, { "age": "15", "gender": "Male", "name": "John" } ], "citiesLived": [ { "place": "Seattle", "yearsLived": [ "1995" ] }, { "place": "Stockholm", "yearsLived": [ "2005" ] } ], "fullName": "John Doe", "gender": "Male", "kind": "person", "phoneNumber": { "areaCode": "206", "number": "1234567" } } Google BigQuery
  2. Schema ------------------------------------ |- kind: string |- fullName: string (required) |-

    age: integer |- gender: string +- phoneNumber: record | |- areaCode: integer | |- number: integer +- children: record (repeated) | |- name: string | |- gender: string | |- age: integer +- citiesLived: record (repeated) | |- place: string | |- yearsLived: integer (repeated)
  3. FLATTEN SELECT fullName, age FROM [my_dataset.my_table] WHERE (citiesLived.yearsLived > 1995

    ) AND (children.age > 3) BigQuery error in query operation: Error processing job 'xxxx:bqjob_xxxxx': Cannot query the cross product of repeated fields children.age and citiesLived.yearsLived.
  4. FLATTEN SELECT fullName, age, gender, citiesLived.place FROM (FLATTEN([my_dataset.my_table], children)) WHERE

    (citiesLived.yearsLived > 1995) AND (children.age > 3) GROUP BY fullName, age, gender, citiesLived.place +------------+-----+--------+-------------------+ | fullName | age | gender | citiesLived_place | +------------+-----+--------+-------------------+ | John Doe | 22 | Male | Stockholm | | Mike Jones | 35 | Male | Los Angeles | | Mike Jones | 35 | Male | Washington DC | | Mike Jones | 35 | Male | Portland | | Mike Jones | 35 | Male | Austin | +------------+-----+--------+-------------------+
  5. WITHIN RECORD SELECT fullName, COUNT(children.name) WITHIN RECORD AS numberOfChildren FROM

    [my_dataset.my_table] +---------------+------------------+ | fullName | numberOfChildren | +---------------+------------------+ | Anna Karenina | 0 | | John Doe | 2 | | Mike Jones | 3 | +---------------+------------------+
  6. WITHIN node_name SELECT fullName, COUNT(citiesLived.place) WITHIN RECORD AS numberOfPlacesLived, citiesLived.place,

    COUNT(citiesLived.yearsLived) WITHIN citiesLived AS numberOfTimesInEachCity, FROM [my_dataset.my_table] +---------------+---------------------+-------------------+-------------------------+ | fullName | numberOfPlacesLived | citiesLived_place | numberOfTimesInEachCity | +---------------+---------------------+-------------------+-------------------------+ | Anna Karenina | 3 | Stockholm | 4 | | Anna Karenina | 3 | Russia | 3 | | Anna Karenina | 3 | Austin | 2 | | John Doe | 2 | Seattle | 1 | | John Doe | 2 | Stockholm | 1 | | Mike Jones | 4 | Los Angeles | 4 | | Mike Jones | 4 | Washington DC | 4 | | Mike Jones | 4 | Portland | 4 | | Mike Jones | 4 | Austin | 4 | +---------------+---------------------+-------------------+-------------------------+