Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Architecture Decisions for Tiny & Big Data

Architecture Decisions for Tiny & Big Data

Enterprises want to be data driven from the very beginning or want to join the race for data supremacy. Being data driven requires the system to store and process every single transaction and interaction the customer makes with the product, thus enabling the business to make better decisions.

But storing, processing, and analyzing data comes with a cost. This cost is distributed across the choice of technology, infrastructure, and go-to-market strategy.

Nischal HP and Raghotham Sripadraj share their experience building data science platforms for various enterprises, with an emphasis on making the right architecture choices for things such as databases, queues, caching mechanisms, distribution of the workload, underlying technology for machine learning and predicitive models, visualization, and prototyping. Nischal and Raghotham stress the importance of using distributed and fault-tolerant tools, which themselves come with the cost of managing the infrastructure (including, by implication, a dedicated team to monitor the infra). However, with small data, simple tools take you a long way.

Many things can go unnoticed in building an end-to-end data science system, like the importance of logging, building a data pipeline that sends notifications to the required medium of communication, exposing data science as a service via APIs, or A/B testing for data science-backed feature releases when required. Only when the data science solution is in production does it power the organization the right way.

When building data science products you should live by the motto “fail fast.” Nischal and Raghotham themselves have failed fast when making these choices, but in time they came to understand that adopting the latest and the coolest technology on the planet just for the sake of it is not the right thing to do.

unnati_xyz

March 16, 2017
Tweet

More Decks by unnati_xyz

Other Decks in Technology

Transcript

  1. Tiny data problems are tricky Get hold of the low

    hanging fruits Beware of Data Sanity with NoSQL Learnings
  2. Tiny data problems are tricky Get hold of the low

    hanging fruits Beware of Data Sanity with NoSQL Embrace Data Science Early Learnings
  3. What are we solving? Predict behavior of Users Educate the

    team on Data Science Optimize campaign costs
  4. Know your databases well Learnings Data pipelines break Exception handling

    - Logging - Notifications Monitor performance of your libraries