Intro to AI and Big Data processing

www.neosperience.com | blog.neosperience.com | [email protected] Neosperience Empathy in Technology AI,
neural networks and applications Novembre, 11th 2020

Luca Bianchi Who am I? github.com/aletheia https://it.linkedin.com/in/lucabianchipavia https://speakerdeck.com/aletheia Chief Technology
Ofﬁcer @ Neosperience Chief Technology Ofﬁcer @ WizKey Serverless Meetup and ServerlessDays Italy co-organizer www.bianchiluca.com @bianchiluca

Artiﬁcial Intelligence What is all this buzz about? Why now
and not before (or after)?

Diﬀerent approaches How? Human intelligence relies on a powerful supercomputer
- our brain - much more powerful than actual HPC servers and is capable of switching between many diﬀerent “algorithms” to understand reality, each one of them is context-independent. • Filling gaps in Existing Knowledge • Understand and apply Knowledge • Semantically reduce uncertainty • Notice similarity between old/new The most powerful capability of our brain and the common denominator of all these features is the capability humans to learn from experience. Learning is the key.

Artiﬁcial Intelligence “the theory and development of computer systems able
to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.”

Roadmap of Deep Learning

Machine Learning What is exactly a machine that learns?

Machine Learning The importance of Experience • Machine Learning (ML)
algorithms have data as input, ‘cause data represents the Experience.  This is a focal point of Machine Learning: large amount of data is needed to achieve good performances. • The Machine Learning equivalent of program in ML world is called ML model and improves over time as soon as more data is provided, with a process called training. • Data must be prepared (or ﬁltered) to be suitable for training process. Generally input data must be collapsed into a n-dimensional array with every item representing a sample. • ML performances are measured in probabilistic terms, with metrics called accuracy or precision. An operational deﬁnition “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E”

Regression Regression analysis helps one understand how the typical value
of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held ﬁxed. Is a statistical method of data analysis. The most common algorithm least square method that provides an estimation of regression parameters. When dataset is not trivial estimation is achieved through is gradient descent.

Regression — Use cases Statistical regression is used to make
predictions about data, ﬁlling the gaps Regression, even in the most simple form of Linear Regression is a good tool to learn from data and make predictions based on data trend. Common Scenarios • Stock price value • Product Price Estimation • Age estimation • Customer satisfaction rate deﬁning variables such as response-time, resolution-ration we can forecast satisfaction level or churn • Customer Conversion rate estimation (based on click data, origin, timestamp, ...)

Classification Classification is the problem of identifying to which of
a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. Most used algorithms for classification are: • Logit Regression • Decision Trees • Random Forest

Common Scenarios • Credit scoring • Human Activity Recognition •
Spam/Not Spam classification • Customer conversion prediction • Customer churn prediction • Customer personas classification Classification — Use cases Classification is used to detect the binary outcome of a variable Classification is often used to classify people into pre-defined clusters (good-payer/bad-payer, in/out target, etc.)

Clustering is the task of grouping a set of objects
in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). The diﬀerence between algorithms is due to the similarity function that is used: • Centroid based clusters • Density based cluster • Random Forest

Common Scenarios • Similar interests recognition • Shape detection •
Similarity analysis • Customer base segmentation Clustering — Use cases Clustering is used to segment data Clustering labels each sample with a name representing its belonging cluster. Labelling can be exclusive or multiple. Clusters are dynamic structures: they adapt to new sample coming into the model as soon as thy label them.

a good ML model is just a component of an
AI system

An example Process data coming from the browser

Neosperience Cloud Cloud Understand Engage Grow

events from the browser

Client JS library embedded into the webpage Browser generated events
An user navigating the webpage produces events with a flexible structure that are sent to the backend Three types of events: • low-level: in response to mouse/touch events, agnostic • mid-level: related to webpage actions, domain-specific • high-level: structured customer-specific events Constraints • response time: beacon support is strict on time • volume: millions of events within a single month • throughput: events could peak to thousands within a few seconds

a pipeline to ingest and process events

convert analyze ingest Collect and process events through pipeline stages.
From the browser to customer insights Data processing pipeline A service unable to ramp up as quickly as the events flow to the system would result in loss of data User Insight ingestion service collects data from many different customers leading to unpredictable load Events needs to be stored, then processed and consolidated into an user profile ingest events collect and send to storage store raw events extract, transform, load store baked events process events to build insights store customer profile

A difficult task with a lot of uncertainty Storing and
analyzing data The amount of data collected by database grows to millions of data points very quickly. i.e. for a single customer, ~130M events collected in just one month Data access pattern is not well defined (parameters within query) and could change whenever high level events are managed for a customer specific context Pulling data from DynamoDB with no clear access pattern means a full table scan for each query. It is not just slow, but also very expensive.

A consolidated technology with unparalleled flexibility Introducing the Data Lake
Amazon S3 - an object storage - 99.99% availability - designed from the ground up to handle traffic for any Internet application - multi AZ reliability - cost effective Amazon S3

Lambda — invocations

Extract, Transform raw events and load into a data catalog
Data pipeline extract, transform, load - Processes events and load them into AWS Glue catalog, then saves to S3 - Aggregates events based on their visit time extracting user sessions - Transforms events encoding respective types into readable and compact format - Uses Apache Spark language to build processing jobs

Data analysis stage Data pipeline - Data is loaded into
AWS Glue catalog and into Amazon S3 from previous stage - Amazon Athena queries build customer insights, leveraging external ML services through Amazon SageMaker - Resulting insights are stored into Amazon Elasticsearch

putting all together..

User Insight data pipeline

Wrap up

Leveraging an ML model means having in place a data
driven architecture Operations + Machine Learning —> MLOps • Think about streams of data ﬂowing into an application • Collect data in a reliable way • Data preparation is often more relevant than ML models • ETL jobs to aggregate and transform data • Query data to ﬁlter relevant data • Deploy ML model and think about scalability • Consolidate data back into NoSQL data stores (i.e. a data lake)

Empathy in Technology

bit.ly/nsp-assi-ai-20201111

www.neosperience.com | blog.neosperience.com | [email protected]

Intro to AI and Big Data processing

Intro to AI and Big Data processing

Aletheia

More Decks by Aletheia

Other Decks in Technology

Featured

Transcript

www.neosperience.com | blog.neosperience.com | [email protected] Neosperience Empathy in Technology AI,

Luca Bianchi Who am I? github.com/aletheia https://it.linkedin.com/in/lucabianchipavia https://speakerdeck.com/aletheia Chief Technology

Artiﬁcial Intelligence What is all this buzz about? Why now

Diﬀerent approaches How? Human intelligence relies on a powerful supercomputer

Artiﬁcial Intelligence “the theory and development of computer systems able

Roadmap of Deep Learning

Machine Learning What is exactly a machine that learns?

Machine Learning The importance of Experience • Machine Learning (ML)

Regression Regression analysis helps one understand how the typical value

Regression — Use cases Statistical regression is used to make

Classiﬁcation Classiﬁcation is the problem of identifying to which of

Common Scenarios • Credit scoring • Human Activity Recognition •

Clustering is the task of grouping a set of objects

Common Scenarios • Similar interests recognition • Shape detection •

a good ML model is just a component of an

An example Process data coming from the browser

Neosperience Cloud Cloud Understand Engage Grow

events from the browser

Client JS library embedded into the webpage Browser generated events

a pipeline to ingest and process events

convert analyze ingest Collect and process events through pipeline stages.

A diﬃcult task with a lot of uncertainty Storing and

A consolidated technology with unparalleled ﬂexibility Introducing the Data Lake

Lambda — invocations

Extract, Transform raw events and load into a data catalog

Data analysis stage Data pipeline - Data is loaded into

putting all together..

User Insight data pipeline

Wrap up

Leveraging an ML model means having in place a data

Empathy in Technology

bit.ly/nsp-assi-ai-20201111

www.neosperience.com | blog.neosperience.com | [email protected]