Slide 1

Slide 1 text

JOSE DOMINGUEZ @JOSMASFLORES A HOMESPUN DECENTRALISED DIY DATA SCIENCE RESEARCH PIPELINE FOR THE INTERNET OF *YOUR* THINGS

Slide 2

Slide 2 text

This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 JOSE DOMINGUEZ 2 @JOSMASFLORES

Slide 3

Slide 3 text

This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 CONTRIBUTIONS 3

Slide 4

Slide 4 text

CO-ORGANISER 4

Slide 5

Slide 5 text

AGENDA ▸ Homespun ▸ *Your* things in IoT ▸ *Your* Data ▸ Data Science ▸ Tools: End User Development ▸ Data Collection and Communication ▸ Backend and Storage ▸ Analysis and Visualisation 5

Slide 6

Slide 6 text

HOMESPUN

Slide 7

Slide 7 text

7

Slide 8

Slide 8 text

SIMPLE AND UNSOPHISTICATED keep it simple! 8

Slide 9

Slide 9 text

*YOUR* THINGS

Slide 10

Slide 10 text

10

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

12

Slide 13

Slide 13 text

PLENTY OF COMPLICATED STUFF OUT THERE… IF YOU NEED IT! Keep it simple, if you can! 13

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

16

Slide 17

Slide 17 text

BORIS ADRYAN ON DATA STORAGE AND VISUALISATION PLATFORMS HTTP://IOT.GHOST.IO/NODE-RED-INTEROPERABILITY-TEST/ 17

Slide 18

Slide 18 text

HARDER THAN YOU MAY THINK LEARNING MODE

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

21

Slide 22

Slide 22 text

WHAT KIND OF STUFF CAN YOU DO WITH *YOUR* THINGS? ▸ Grab raw data from the accelerometer ▸ Zero cross analysis can give you a rough amount of steps. 22

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

BUT… BUT… BUT… WHY DO ALL THESE THINGS IF COMMERCIAL SOLUTIONS DO IT OUT OF THE BOX? 25

Slide 26

Slide 26 text

*YOUR* DATA

Slide 27

Slide 27 text

GOOGLE APIS. SECTION 5: CONTENT b. Submission of Content Some of our APIs allow the submission of content. Google does not acquire any ownership of any intellectual property rights in the content that you submit to our APIs through your API Client, except as expressly provided in the Terms. For the sole purpose of enabling Google to provide, secure, and improve the APIs (and the related service(s)) and only in accordance with the applicable Google privacy policies, you give Google a perpetual, irrevocable, worldwide, sublicensable, royalty-free, and non-exclusive license to Use content submitted, posted, or displayed to or from the APIs through your API Client. ”Use” means use, host, store, modify, communicate, and publish. Before you submit content to our APIs through your API Client, you will ensure that you have the necessary rights (including the necessary rights from your end users) to grant us the license.

Slide 28

Slide 28 text

GOOGLE APIS. SECTION 5: CONTENT b. Submission of Content Some of our APIs allow the submission of content. Google does not acquire any ownership of any intellectual property rights in the content that you submit to our APIs through your API Client, except as expressly provided in the Terms. For the sole purpose of enabling Google to provide, secure, and improve the APIs (and the related service(s)) and only in accordance with the applicable Google privacy policies, you give Google a perpetual, irrevocable, worldwide, sublicensable, royalty-free, and non-exclusive license to Use content submitted, posted, or displayed to or from the APIs through your API Client. ”Use” means use, host, store, modify, communicate, and publish. Before you submit content to our APIs through your API Client, you will ensure that you have the necessary rights (including the necessary rights from your end users) to grant us the license.

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

POSTING YOUR CONTENT ON THE FITBIT SERVICE ▸ You may post photos, exercise regimens, food logs, recipes, comments, and other content (“Your Content”) to the Fitbit Service. You retain all rights to Your Content that you post to the Fitbit Service. By making Your Content available on or through the Fitbit Service you grant to Fitbit a non- exclusive, transferable, sublicensable, worldwide, royalty-free license to use, copy, modify, publicly display, publicly perform and distribute Your Content only in connection with operating and providing the Fitbit Service. 30

Slide 31

Slide 31 text

WITH A SMALL NUMBER OF GEOLOCATION DATA POINTS (1 DAYS WORTH), LOCATION CAN BE INFERRED, POTENTIALLY LEADING TO PRIVACY DISCLOSURES Liccardi et al, 2016. I Know Where You Live: Inferring Details of People's Lives by Visualizing Publicly Shared Location Data (CHI '16).

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

AN ANONYMISED MEDICAL DATABASE WAS SUCCESSFULLY COMBINED WITH A VOTERS LIST TO EXTRACT THE HEALTH RECORD OF THE GOVERNOR OF MASSACHUSETTS Sweeney, L. k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness and Knowledge-Based Systems 10, 557–570 (2002) 33

Slide 34

Slide 34 text

USING PUBLICLY AVAILABLE DATA, THE TYPES OF LOCATIONS CAN BE USED TO ESTIMATE SOMEONE’S AVERAGE INCOME […], AVERAGE HOUSING COST, DEBT, […] POLITICAL VIEWS ETC. Liccardi et al, 2016. I Know Where You Live: Inferring Details of People's Lives by Visualizing Publicly Shared Location Data (CHI '16). 34

Slide 35

Slide 35 text

PRIVACY CONCERNS ▸ IoT and fitness devices especially. 35

Slide 36

Slide 36 text

HEALS AND GEORGATOS (2014), PRIVACY AND HEALTH: HOW MOBILE HEALTH ‘APPS’ FIT INTO A PRIVACY FRAMEWORK NOT LIMITED TO HIPPA ▸ Surveillance (unauthorised collection) ▸ Identification ▸ Insecurity (lack of encryption) ▸ Disclosure (of sensitive data to third parties) ▸ Aggregation (consumer profiles) 36

Slide 37

Slide 37 text

A FEW MORE REASONS IN CASE YOU STILL DON’T CARE Steven Spann (2016). Wearable Fitness Devices: Health Data Privacy in Washington State

Slide 38

Slide 38 text

DATA COULD BE USED TO LEGALLY OR ILLEGALLY RESTRICT AN INDIVIDUAL’S ABILITY TO ACCESS CERTAIN MARKETS Steven Spann

Slide 39

Slide 39 text

STEVEN SPANN (2016) THROUGH THE USE OF PERSONAL DATA, ENTITIES COULD DISCRIMINATE AGAINST AND INDIVIDUAL IN: ▸Employment ▸Health Care and Insurance ▸Credit-based Lending and other life necessities and options

Slide 40

Slide 40 text

IT’S NOT JUST ADS 40 http://www.newyorker.com/images/2014/07/21/cartoons/140721_cartoon_047_a18282_p465.jpg

Slide 41

Slide 41 text

TERMS AND CONDITIONS ▸ T&C concerns 41

Slide 42

Slide 42 text

TERMS AND CONDITIONS IN RESEARCH CONTEXTS ▸ T&C concerns can inadvertently break Ethics agreements. 42

Slide 43

Slide 43 text

TERMS AND CONDITIONS IN RESEARCH CONTEXTS ▸ T&C concerns can inadvertently break Ethics agreements. 43

Slide 44

Slide 44 text

▸ http://motherboard.vice.com/read/free-isnt-freedom-epstein-essay 44

Slide 45

Slide 45 text

DECENTRALISED BLOCKCHAIN

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

THE REVOLUTION WILL (NOT) BE DECENTRALISED ▸http://commonstransition.org/the- revolution-will-not-be- decentralised-blockchains/ 47

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

49

Slide 50

Slide 50 text

DATA SCIENCE

Slide 51

Slide 51 text

DATA SCIENCE WHAT IS DATA SCIENCE 51

Slide 52

Slide 52 text

DATA SCIENCE WHAT IS DATA SCIENCE 52

Slide 53

Slide 53 text

DATA SCIENCE WHAT IS DATA SCIENCE 53

Slide 54

Slide 54 text

DATA SCIENCE WHAT IS DATA SCIENCE 54

Slide 55

Slide 55 text

DATA SCIENCE WHAT IS DATA SCIENCE 55

Slide 56

Slide 56 text

DATA SCIENCE HOW TO READ THE DATA SCIENCE BENN DIAGRAM 56 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Slide 57

Slide 57 text

RESOURCES DATA SCIENCE

Slide 58

Slide 58 text

DATA SCIENCE 58

Slide 59

Slide 59 text

CS109 - HARVARD ▸ The prerequisite for this class is programming knowledge at the level of CS 50 (or above), and statistics knowledge at the level of Stat 100 (or above). ▸ Stats 110: sample spaces, naive definition of probability, counting, sampling, random variables, CDFs, PMFs, discrete vs. continuous, Hypergeometric, Poisson distribution, Poisson approximation, standard Normal, Normal normalizing constant, Markov chains, transition matrix, stationary distribution, Chi-Square, Student-t, Multivariate Normal 59

Slide 60

Slide 60 text

CAN YOU HACK YOUR WAY THROUGH IT?

Slide 61

Slide 61 text

DATA SCIENCE TETIANA IVANOVA: DATA SCIENCE IN 6 MONTHS

Slide 62

Slide 62 text

STATISTICS JAKE VANDERPLAS: STATISTICS FOR HACKERS

Slide 63

Slide 63 text

DATA SCIENCE BE WEARY OF TERMINOLOGY ▸ Is it Machine Learning or Data Mining? Or is it Statistical Learning? 63

Slide 64

Slide 64 text

DATA SCIENCE WHAT ARE YOU TRYING TO DO? ▸ Figure something out ▸ Predict something 64

Slide 65

Slide 65 text

DATA SCIENCE TYPES OF LEARNING ▸ Supervised: dataset with tags ▸ Unsupervised: dataset with no tags 65

Slide 66

Slide 66 text

DATA SCIENCE TYPES OF LEARNING ▸ Supervised ▸ Unsupervised ▸ Reinforcement 66

Slide 67

Slide 67 text

DATA SCIENCE TYPES OF ACTIVITY ▸ Classification (supervised) ▸ Regression (supervised) ▸ Clustering (unsupervised) 67

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

DIY

Slide 70

Slide 70 text

70

Slide 71

Slide 71 text

END USER DEVELOPMENT

Slide 72

Slide 72 text

A SET OF METHODS, TECHNIQUES AND TOOLS THAT ALLOW USERS OF SOFTWARE SYSTEMS, WHO ARE ACTING AS NON-PROFESSIONAL SOFTWARE DEVELOPERS, AT SOME POINT TO CREATE, MODIFY, OR EXTEND A SOFTWARE ARTIFACT. Lieberman et al. 2006

Slide 73

Slide 73 text

A SET OF METHODS, TECHNIQUES AND TOOLS THAT ALLOW USERS OF SOFTWARE SYSTEMS, WHO ARE ACTING AS NON-PROFESSIONAL SOFTWARE DEVELOPERS, AT SOME POINT TO CREATE, MODIFY, OR EXTEND A SOFTWARE ARTIFACT. Lieberman et al. 2006

Slide 74

Slide 74 text

No content

Slide 75

Slide 75 text

No content

Slide 76

Slide 76 text

https://www.youtube.com/watch?v=0nbkaYsR94c

Slide 77

Slide 77 text

No content

Slide 78

Slide 78 text

No content

Slide 79

Slide 79 text

No content

Slide 80

Slide 80 text

TOOLS

Slide 81

Slide 81 text

TOOLS DATA COLLECTION AND COMMUNICATION ▸ MIT App Inventor ▸ Node-RED 81

Slide 82

Slide 82 text

DATA COLLECTION APP INVENTOR 82 ‣ DEMO

Slide 83

Slide 83 text

ACTIVITY DETECTION: JUMPING

Slide 84

Slide 84 text

TEXT DATA ABSTRACTION ▸ Data used at different levels of abstraction in App Inventor

Slide 85

Slide 85 text

COMMUNICATION DATA COMMUNICATION 85 ‣ Use HTTP ‣ Save data to a file and send it to a server ‣ Do analysis directly on the phone; necessary if you need to do real time: see Jan Machacek's Exercise Analysis talk. ‣ You probably want to use MQTT

Slide 86

Slide 86 text

COMMUNICATION MQTT PROTOCOL 86

Slide 87

Slide 87 text

No content

Slide 88

Slide 88 text

No content

Slide 89

Slide 89 text

No content

Slide 90

Slide 90 text

DATA COLLECTION AND COMMUNICATION NODE-RED 90

Slide 91

Slide 91 text

COMMUNICATION AND STORAGE APP INVENTOR AND NODE-RED 91 ‣ Collect accelerometer data in App Inventor ‣ MQTT to a broker ‣ Subscribe in Node-RED: ‣ send to storage ‣ analyse ‣ visualise

Slide 92

Slide 92 text

COMMUNICATION AND INTERACTION APP INVENTOR AND NODE-RED 92 ‣ Send a notification from Node-RED with AeroGear Push Server ‣ Get it on the phone with App Inventor Work In Progress! Node-RED node for Notifications: https://github.com/CLDTio/node-red-contrib-aerogear-notifications App Inventor Component for AeroGear: https://github.com/josmas/app-inventor/tree/aerogear

Slide 93

Slide 93 text

TOOLS BACKEND AND STORAGE ▸ Use a commercial solution: ▸ there are plenty, and now you know better so you can make an informed decision if you want to use it or not! ▸ iSense (read T&Cs first) ▸ Your own backend with Parse, MIT Solid, or any other solution you like (including rolling your own) 93

Slide 94

Slide 94 text

BACKEND ISENSE 94

Slide 95

Slide 95 text

STORAGE DATABASES 95 Time Series Real Time

Slide 96

Slide 96 text

GRAPHING AND MONITORING

Slide 97

Slide 97 text

DIY BACKEND 97

Slide 98

Slide 98 text

BACKEND SERVICES MANY SOLUTIONS… REMEMBER TO READ THE T&CS!!! 98

Slide 99

Slide 99 text

CONTAINERS DOCKER FOR EVERYTHING 99 ‣ https://github.com/CLDTio/docker-influxdb ‣ https://github.com/CLDTio/appinventor-env-docker ‣ https://github.com/tensorflow/tensorflow/tree/master/ tensorflow/tools/docker ‣ Browse the Docker Hub and Github for many more!

Slide 100

Slide 100 text

TOOLS DATA ANALYSIS AND VISUALISATION ▸ iSense ▸ Weka ▸ Machine learning ▸ Activity recognition ▸ Visualisation tools ▸ Python notebooks, R, Julia, spider, and so forth. 100

Slide 101

Slide 101 text

VISUALISATION ISENSE 101

Slide 102

Slide 102 text

ACTIVITY RECOGNITION JUPYTER NOTEBOOKS 102

Slide 103

Slide 103 text

No content

Slide 104

Slide 104 text

MACHINE LEARNING AND VISUALISATION WEKA AND THE IRIS DATASET 104

Slide 105

Slide 105 text

No content

Slide 106

Slide 106 text

No content

Slide 107

Slide 107 text

No content

Slide 108

Slide 108 text

PICTURE CREDITS ▸ Tablet: http://siliconangle.com/files/2012/02/Android_Portrait_Overview.jpg ▸ Smart Watch: https://lh3.ggpht.com/ElwMg8bubiVB33euotYaD_mpKxSrr7SXTsrMwamk3_SRZx1VYkqVT8-HvkQDqXvLWw=h900 ▸ HeartRate monitor http://i00.i.aliimg.com/img/pb/355/386/400/400386355_864.jpg ▸ Arduino: https://cdn.instructables.com/FDW/WCKV/HKBG733D/FDWWCKVHKBG733D.MEDIUM.jpg ▸ Core motion axes: http://blog.denivip.ru/wp-content/uploads/2013/07/CoreMotionAxes.png ▸ Zero cross: https://c2.staticflickr.com/8/7269/7866678792_b375ae7b26.jpg ▸ Step counter: https://www.mathworks.com/matlabcentral/mlc-downloads/downloads/submissions/40876/versions/8/previews/sensorgroup/Examples/html/StepCounter_03.png ▸ Rewriting: http://www.boyter.org/wp-content/uploads/2016/04/Ce5nYp0W4AAxp5Z.jpg ▸ Machine learning NG: http://cdn.usefulstuff.io/2016/01/machine-learning-ng.jpg ▸ Researcher: http://images.clipartpanda.com/researcher-clipart-cartoons_laboratory_201533_tnb.png ▸ Ethics: http://tcrc.eu/en/wp-content/uploads/2013/06/ethics.png ▸ Developer: http://www.grapessoftware.com/wp-content/uploads/2014/02/Hire-Developer-Sprite.png ▸ Señor developer: http://startupmyway.com/wp-content/uploads/2016/02/senordeveloper.gif ▸ API integrations: http://www.surf2host.com/images/api_integrations.png ▸ Data Science: http://static1.squarespace.com/static/5150aec6e4b0e340ec52710a/t/51525c33e4b0b3e0d10f77ab/1364352052403/Data_Science_VD.png ▸ DreamWeaver: http://getintopc.com/wp-content/uploads/2014/02/dreamweaver-cs6-jquery-mobile.png ▸ Spreadsheet: http://www.openoffice.us.com/cmsimages/software/calc2.gif ▸ Advanced spreadsheet: http://www.comfsm.fm/~dleeling/statistics/sc3/cover.png ▸ LabView: http://sql-lv.sourceforge.net/new_sql_LV.png ▸ E-prime2: http://www.pstnet.com/internal/kbimage/1801-1.gif ▸ Blockchain: http://www.cbronline.com/Uploads/NewsArticle/4978988/main.jpg ▸ Ethereum: https://ethereum.org/images/wallpaper-homestead.jpg ▸ Types of networks: https://upload.wikimedia.org/wikipedia/en/b/ba/Centralised-decentralised-distributed.png ▸ AeroGear logo: https://yt3.ggpht.com/-9bdWbHB80Og/AAAAAAAAAAI/AAAAAAAAAAA/2VbWVkS5CmY/s88-c-k-no-mo-rj-c0xffffff/photo.jpg ▸ Iris dataset: http://5047-presscdn.pagely.netdna-cdn.com/wp-content/uploads/2015/04/iris_petal_sepal.png ▸ Jupyter: http://jupyter.org/assets/main-logo.svg ▸ Pandas: http://pandas.pydata.org/_static/pandas_logo.png 108

Slide 109

Slide 109 text

QUESTIONS? 109 @JOSMASFLORES