Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using Elasticsearch to Help Generate New Insights from Census Data

Elastic Co
October 13, 2016

Using Elasticsearch to Help Generate New Insights from Census Data

The Census Bureau has large amounts of rich and complex data sets that are retrieved and used each day by the public. This data reveals insights into our economy, demographic characteristics of states, and helps communities make infrastructural decisions such as where and when to plan public transportation systems and the location of new housing. Given the criticality of Census data, a team at the Bureau has built a prototype that leverages Elasticsearch to make data more accessible and relevant to users. This talk explores our key challenges, successes, and how we used Elasticsearch to build the prototype

Elastic Co

October 13, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Daewoo Chong, Lead Data Scientist Using Elasticsearch to Help Generate

    New Insights from Census Data Booz Allen Hamilton October 13, 2016 Jesus Jackson, Chief Data Scientist
  2. 2 Agenda •  Introduction •  Search Prototype –  What is

    the prototype? –  What is Census data like? •  Indexing Strategy –  What are some challenges to indexing Census data? –  What strategies have been considered? Which is the current strategy? •  Prototype Features –  What are some key features of the prototype? –  What are some key features of Elasticsearch? •  Demo •  Questions?
  3. 3 Introduction •  Booz Allen for over 7 years • 

    Leads federal data science prac6ce for financial services •  Random fact: Has gone skydiving •  Booz Allen for 1.5 years •  Machine learning enthusiast •  Random fact: Trains neural nets in spare 6me Jesus Jackson Daewoo Chong Our other team members: Julia Stevens Raj Cheekatamarla Ram Anusuri Our client team members: Richie Wang David MacCormack Zachary Whitman
  4. 5 What is the prototype? Make it simple for users

    to access the data that’s most relevant to them at any given time.
  5. 6 Search Prototype What is Census data like? –  Census

    data is very rich and complex. Examples are: •  Surveys/Programs: American Community Survey, Decennial Census, Survey of Business Owners, etc. •  Geographies: Region, State, County, School Districts, Blocks, etc. •  Topics: Age, Computer and Internet Use, Poverty, Race, Sex, etc. •  Industries: Construction, Finance and Insurance, Manufacturing, Wholesale Trade, etc.
  6. 9 Indexing Strategy What are some challenges to indexing Census

    data? –  Boolean search model –  Filter across topics, geographies, industries, datasets, vintages, etc. –  No dead ends –  ~100 ms response times
  7. 10 Indexing Strategy What strategies have been considered? Which is

    the current strategy? Denormalized Easy to query but size can be cost prohibitive Parent / Child Size not as cost prohibitive but much slower response times Semi-normalized Size is economical, response times are fast, but no more cell values
  8. 12 Prototype Features What are some key features of the

    prototype? –  No dead ends –  ~100 ms response times –  Use Elastic stack to parse, index, and analyze logs –  Use Shield for authentication and role-based access control
  9. 13 Prototype Features What are some key features of Elasticsearch

    that benefited the prototype? –  Query caching improves response times by ~25% –  Mapping API –  Relevancy models, e.g., TF/IDF, BM25 –  Query API –  Multilingual support, e.g., JavaScript, Python, Java
  10. 15

  11. 16

  12. 17

  13. Except where otherwise noted, this work is licensed under hSp://crea6vecommons.org/licenses/by-nd/4.0/

    Crea6ve Commons and the double C in a circle are registered trademarks of Crea6ve Commons in the United States and other countries. Third party marks and brands are the property of their respec6ve holders.