Slide 1

Slide 1 text

What Is Apache Samoa ? ● An Apache incubator project ● A machine learning framework ● A distributed scaleable system ● Deploys to existing Apache systems – Storm, S4, Samza, AVRO – Deploy a Samoa algorithm these systems – Samoa abstracts implementation via API ● Designed for stream processing ● Offers a range of ML algorithms

Slide 2

Slide 2 text

Samoa Terms Samoa terms that might be of use PE PI EPI Spout Bolt ML Processing element Processing item Entrance processing item A storm term for a data source A storm term for a data join element Machine learning

Slide 3

Slide 3 text

Samoa Algorithms ● Samoa supported algorithms – Prequential Evaluation Task – Vertical Hoeffding Tree Classifier – Adaptive Model Rules Regressor – Bagging and Boosting – Distributed Stream Clustering – Distributed Stream Frequent Itemset Mining – SAMOA for MOA users

Slide 4

Slide 4 text

Samoa Architecture

Slide 5

Slide 5 text

Samoa Architecture

Slide 6

Slide 6 text

Samoa Architecture ● The aim of Samoa is to provide implementation abstraction ● For stream processing algorithms ● Written using it's API ● Against the stream processing systems that it supports ● So for instance, write an algorithm once and ● Deploy to S4 and Storm ● The deployment process creates a platform jar ● That you can deploy to the specific platform

Slide 7

Slide 7 text

Samoa Topology

Slide 8

Slide 8 text

Samoa Topology ● Samoa provides a simple topology for stream processing ● This includes the elements – Processor – Content Event – Stream – Task – Topology Builder – Learner – Processing Item

Slide 9

Slide 9 text

Samoa Processor ● Processor is the basic logical processing unit ● All logic is written in the processor ● In Samoa, a Processor is an interface ● Users can implement this interface – To build their own custom class ● A processor in a Samoa topology can be – A processor in the topology – An entrance processor which sources the stream

Slide 10

Slide 10 text

Samoa Content Event ● A message or an event is called Content Event in Samoa ● It is an event which contains content which ● Needs to be processed by the processors ● ContentEvent has been implemented as an interface in Samoa ● Users need to implement ContentEvent interface ● To create their custom message classes

Slide 11

Slide 11 text

Samoa Stream ● A stream is a physical unit of SAMOA topology ● Which connects different Processors with each other ● Stream is also created by a TopologyBuilder – Just like a Processor ● A stream can have a single source but many destinations ● A Processor which is the source of a stream owns the stream

Slide 12

Slide 12 text

Samoa Task ● Task is similar to a job in Hadoop ● Task is an execution entity ● A topology must be defined inside a task ● Samoa can only execute classes ● That implement Task interface

Slide 13

Slide 13 text

Samoa Topology Builder ● TopologyBuilder is a builder class ● Which builds physical units of the topology ● And assemble them together ● Each topology has a name ● An example topology might have – An EntrancePI – Some PI's – Some streams

Slide 14

Slide 14 text

Samoa Learner ● Learners are sub-topologies ● Use init() function to – Add streams – Add processors – Specify connections to the topology ● Use getInputProcessor() function to – Add processor that will manage the input stream ● Use getResultStream() function to – Specify what is going to be the output stream

Slide 15

Slide 15 text

Samoa Processing Item ● Processing Item is a hidden physical unit of the topology ● Is just a wrapper of Processor ● It is used internally ● Is not accessible from the API ● Connects the Processor to the other processors in the topology – Simple Processing Item (PI) – Entrance Processing Item (EntrancePI)

Slide 16

Slide 16 text

Available Books ● See “Big Data Made Easy” – Apress Jan 2015 ● See “Mastering Apache Spark” – Packt Oct 2015 ● See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” ● Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ ● Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

Slide 17

Slide 17 text

Connect ● Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at – open-source-systems.blogspot.com/ ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration