Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big data and Machine learning APIs

Sam Bessalah
December 03, 2014

Big data and Machine learning APIs

Sam Bessalah

December 03, 2014
Tweet

More Decks by Sam Bessalah

Other Decks in Technology

Transcript

  1. A Big Data Legend … Web logs Sensors Other Data

    sources .. . . . Data Driven Decisions Smart Applications
  2. - Building big data infrastructures is no easy task. -

    Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
  3. Solutions …. - Build Data platforms as a service. -

    Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
  4. Data Sources - High Throughput distributed mssaging platform - Publish

    Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
  5. Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)

    - Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
  6. Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm

    Training Predictive Model New Data Feature Vector Prediction
  7. Text, Images, etc Feature Extraction Predictive Model New Data Prediction

    X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
  8. - Data locality and data gravity - Support the full

    workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations