Big data and Machine learning APIs

50c1b0fe4cdb0e8e7992d6872cf6cfd7?s=47 Sam Bessalah
December 03, 2014

Big data and Machine learning APIs

50c1b0fe4cdb0e8e7992d6872cf6cfd7?s=128

Sam Bessalah

December 03, 2014
Tweet

Transcript

  1. 2.
  2. 3.
  3. 4.
  4. 5.
  5. 12.

    A Big Data Legend … Web logs Sensors Other Data

    sources .. . . . Data Driven Decisions Smart Applications
  6. 13.
  7. 14.

    - Building big data infrastructures is no easy task. -

    Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
  8. 15.

    Solutions …. - Build Data platforms as a service. -

    Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
  9. 16.
  10. 21.

    Data Sources - High Throughput distributed mssaging platform - Publish

    Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
  11. 24.
  12. 26.
  13. 28.

    Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)

    - Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
  14. 30.
  15. 33.
  16. 40.

    Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm

    Training Predictive Model New Data Feature Vector Prediction
  17. 45.

    Text, Images, etc Feature Extraction Predictive Model New Data Prediction

    X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
  18. 55.

    - Data locality and data gravity - Support the full

    workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
  19. 57.