Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nested JSON into Google BigQuery by Fluentd

Nested JSON into Google BigQuery by Fluentd

Takeshi Yako

March 16, 2015
Tweet

More Decks by Takeshi Yako

Other Decks in Technology

Transcript

  1. Nested JSON
    into Google BigQuery
    by Fluentd
    Takeshi Yako

    View Slide

  2. {
    "age": 22,
    "children": [
    {
    "age": "6",
    "gender": "Female",
    "name": "Jane"
    },
    {
    "age": "15",
    "gender": "Male",
    "name": "John"
    }
    ],
    "citiesLived": [
    {
    "place": "Seattle",
    "yearsLived": [
    "1995"
    ]
    },
    {
    "place": "Stockholm",
    "yearsLived": [
    "2005"
    ]
    }
    ],
    "fullName": "John Doe",
    "gender": "Male",
    "kind": "person",
    "phoneNumber": {
    "areaCode": "206",
    "number": "1234567"
    }
    }
    Google BigQuery

    View Slide

  3. Schema
    ------------------------------------
    |- kind: string
    |- fullName: string (required)
    |- age: integer
    |- gender: string
    +- phoneNumber: record
    | |- areaCode: integer
    | |- number: integer
    +- children: record (repeated)
    | |- name: string
    | |- gender: string
    | |- age: integer
    +- citiesLived: record (repeated)
    | |- place: string
    | |- yearsLived: integer (repeated)

    View Slide

  4. View Slide

  5. FLATTEN
    SELECT
    fullName, age
    FROM [my_dataset.my_table]
    WHERE
    (citiesLived.yearsLived > 1995 ) AND
    (children.age > 3)
    BigQuery error in query operation:
    Error processing job 'xxxx:bqjob_xxxxx':
    Cannot query the
    cross product of repeated fields children.age and citiesLived.yearsLived.

    View Slide

  6. FLATTEN
    SELECT
    fullName,
    age,
    gender,
    citiesLived.place
    FROM (FLATTEN([my_dataset.my_table], children))
    WHERE
    (citiesLived.yearsLived > 1995) AND
    (children.age > 3)
    GROUP BY fullName, age, gender, citiesLived.place
    +------------+-----+--------+-------------------+
    | fullName | age | gender | citiesLived_place |
    +------------+-----+--------+-------------------+
    | John Doe | 22 | Male | Stockholm |
    | Mike Jones | 35 | Male | Los Angeles |
    | Mike Jones | 35 | Male | Washington DC |
    | Mike Jones | 35 | Male | Portland |
    | Mike Jones | 35 | Male | Austin |
    +------------+-----+--------+-------------------+

    View Slide

  7. WITHIN RECORD
    SELECT
    fullName,
    COUNT(children.name) WITHIN RECORD AS numberOfChildren
    FROM [my_dataset.my_table]
    +---------------+------------------+
    | fullName | numberOfChildren |
    +---------------+------------------+
    | Anna Karenina | 0 |
    | John Doe | 2 |
    | Mike Jones | 3 |
    +---------------+------------------+

    View Slide

  8. WITHIN node_name
    SELECT
    fullName,
    COUNT(citiesLived.place) WITHIN RECORD AS numberOfPlacesLived,
    citiesLived.place,
    COUNT(citiesLived.yearsLived) WITHIN citiesLived AS numberOfTimesInEachCity,
    FROM [my_dataset.my_table]
    +---------------+---------------------+-------------------+-------------------------+
    | fullName | numberOfPlacesLived | citiesLived_place | numberOfTimesInEachCity |
    +---------------+---------------------+-------------------+-------------------------+
    | Anna Karenina | 3 | Stockholm | 4 |
    | Anna Karenina | 3 | Russia | 3 |
    | Anna Karenina | 3 | Austin | 2 |
    | John Doe | 2 | Seattle | 1 |
    | John Doe | 2 | Stockholm | 1 |
    | Mike Jones | 4 | Los Angeles | 4 |
    | Mike Jones | 4 | Washington DC | 4 |
    | Mike Jones | 4 | Portland | 4 |
    | Mike Jones | 4 | Austin | 4 |
    +---------------+---------------------+-------------------+-------------------------+

    View Slide