http://strataconf.com/big-data-conference-ny-2015/public/schedule/detail/46809
The flexibility and simplicity of JSON have made it one of the most common formats for data. Data engines need to be able to load, process, and query JSON and nested data types quickly and efficiently. There are multiple approaches to processing JSON data, each with trade offs.
In this session we’ll discuss the reasons and ways that developers want to use flexible schema options and the challenges that creates for processing and querying that data. We’ll dive into the approaches taken by different technologies such as Hive, Drill, BigQuery, Spark, and others, and the performance and complexity trade offs of each. The attendee will leave with an understanding of how to assess which system is best for their use case.