Tim McNamara: A look at NuPIC - A self-learning AI engine

Transcript

A look at NuPIC: A self-learning AI engine Tim McNamara

Disclaimers: • Personal, not professional interest • Neither neuroscience, nor

machine learning expert

Problem statement: • New Zealand has excellent environmental sensor data.

Can we build useful applications based these sources to provide hazard impact guidance? More concretely: • Can we predict ﬂoods on rivers automatically based on rain data and historic pattern? Hiring machine learning expertise to create bespoke models is very expensive. • Can we get an early hunch for the human impact of an earthquake from location & magnitude data only?

So

What is NuPIC?

A system for building AI models which has been built

on the basis of neurological foundations.

Fundamentally, here is the argument: The neocortex is doing something

right. Let's implement that.

The neocortex appears to have a very regular structure, irrespective

of the learning domain.

What are some features of that structure? - The neocortex

is heavily hierarchical, with 5 celluar and 1 non-celluar level through most of it. - Cortical regions appear to appears to use sparse, distributed representations to store information. - Temporality is important, e.g. we can distinguish two senses of the same input sounds depending on context: "I ate eight apples."

Motivations for preparing this talk

1. A system built on self-learning

There is a trade oﬀ between how much memory is

allocated to each level [of the model] and how many levels are needed. Fortunately, HTMs automatically learn the best possible representations at each level given statistics of the input and the amount of resources allocated. — Numenta White Paper (2011), p 28

Real brains are highly “plastic”, regions of the neocortex can

learn to represent entirely diﬀerent things in reaction to various changes. If part of the neocortex is damaged, other parts will adjust to represent what the damaged part used to represent. … The system is self-adjusting. — Numenta White Paper (2011), p 28

2. Streams are emerging from everywhere.

3. Numenta is being very brave.

NuPIC Concepts

Cortical Learning Algorithm (CLA) Sparse, distributed representations (SDRs) Online Prediction

Framework (OPF) Hierarchical Temporal Memory (HTM)

Sparse, distributed representations

Information encoded as a 2048 bit array of 0s and

1s. Within any array, only a small number of bits will be activated for any given input. Matching active bits means that two inputs are similiar.

0100000000000000000000010000000000000000000000000000000000000000010000..........01000 0000000010000000000000010000000000000000000000000010000000000000010000..........00000

Introducing hotgym.py

gym,address,timestamp,consumption string,string,datetime,float S,,T, Balgowlah Platinum,Shop 67 197-215 Condamine Street Balgowlah

2093,2010-07-02 00:00:00.0,5.3 Balgowlah Platinum,Shop 67 197-215 Condamine Street Balgowlah 2093,2010-07-02 00:15:00.0,5.5 Balgowlah Platinum,Shop 67 197-215 Condamine Street Balgowlah 2093,2010-07-02 00:30:00.0,5.1 Balgowlah Platinum,Shop 67 197-215 Condamine Street Balgowlah 2093,2010-07-02 00:45:00.0,5.3 Balgowlah Platinum,Shop 67 197-215 Condamine Street Balgowlah 2093,2010-07-02 01:00:00.0,5.2 ...

gym,address,timestamp,consumption string,string,datetime,float S,,T, Balgowlah Platinum,Shop 67 197-215 Condamine Street Balgowlah

model = ModelFactory.create(model_params.MODEL_PARAMS) model.enableInference({'predictedField': 'consumption'}) reader = csv.reader(open(_DATA_PATH)) headers =

reader.next() for i, record in enumerate(reader, start=1): modelInput = dict(zip(headers, record)) modelInput["consumption"] = float(modelInput["consumption"]) modelInput["timestamp"] = datetime.datetime( modelInput["timestamp"], "%m/%d/%y %H:%M") result = model.run(modelInput)

model = ModelFactory.create(model_params.MODEL_PARAMS) model.enableInference({'predictedField': 'consumption'}) reader = csv.reader(open(_DATA_PATH)) headers =

ModelResult( inferences={ 'multiStepPredictions': { 1: { 5.2825868514199987: 0.69999516634971859, 10.699999999999999: 0.07601257054965195,

22.100000000000001: 0.055294648127235196, 22.899999999999999: 0.052690624183750749, }, 5: { 38.188079999999999: 0.2275438176777452, 47.359999999999992: 0.19538808382423584, 37.399999999999999: 0.12597931862094047, 45.399999999999999: 0.099123261272031596, 37.089999999999996: 0.082913215936932752, 39.280000000000001: 0.077935781935515161, 43.629999999999995: 0.076405289164189288 } }, 'multiStepBestPredictions': { 1: 5.2825868514199987, 5: 38.188079999999999 } } ... )

MODEL_PARAMS = { # Type of model that the rest

of these parameters apply to. 'model': "CLA", # Version that specifies the format of the config. 'version': 1, # Intermediate variables used to compute fields in modelParams and also # referenced from the control section. 'aggregationInfo': { 'days': 0, 'fields': [('consumption', 'sum')], 'hours': 1, 'microseconds': 0, 'milliseconds': 0, 'minutes': 0, 'months': 0, 'seconds': 0, 'weeks': 0, 'years': 0}, 'predictAheadTime': None, # Model parameter dictionary. 'modelParams': { # The type of inference that this model will perform 'inferenceType': 'TemporalMultiStep', 'sensorParams': { # Sensor diagnostic output verbosity control; # if > 0: sensor region will print out on screen what it's sensing # at each step 0: silent; >=1: some info; >=2: more info; # >=3: even more info (see compute() in py/regions/RecordSensor.py) 'verbosity' : 0, # Example: # dsEncoderSchema = [ # DeferredDictLookup('__field_name_encoder'), # ], # # (value generated from DS_ENCODER_SCHEMA) 'encoders': { 'consumption': { 'clipInput': True, 'fieldname': u'consumption', 'n': 100, 'name': u'consumption', 'type': 'AdaptiveScalarEncoder', 'w': 21}, 'timestamp_dayOfWeek': { 'dayOfWeek': (21, 1), 'fieldname': u'timestamp', 'name': u'timestamp_dayOfWeek', 'type': 'DateEncoder'}, 'timestamp_timeOfDay': { 'fieldname': u'timestamp', 'name': u'timestamp_timeOfDay', 'timeOfDay': (21, 1), 'type': 'DateEncoder'}, 'timestamp_weekend': { 'fieldname': u'timestamp', 'name': u'timestamp_weekend', 'type': 'DateEncoder', 'weekend': 21}}, # A dictionary specifying the period for automatically-generated # resets from a RecordSensor; # # None = disable automatically-generated resets (also disabled if # all of the specified values evaluate to 0). # Valid keys is the desired combination of the following: # days, hours, minutes, seconds, milliseconds, microseconds, weeks # # Example for 1.5 days: sensorAutoReset = dict(days=1,hours=12), # # (value generated from SENSOR_AUTO_RESET) 'sensorAutoReset' : None, }, 'spEnable': True, 'spParams': { # SP diagnostic output verbosity control; # 0: silent; >=1: some info; >=2: more info; 'spVerbosity' : 0, 'globalInhibition': 1, # Number of cell columns in the cortical region (same number for # SP and TP) # (see also tpNCellsPerCol) 'columnCount': 2048, 'inputWidth': 0, # SP inhibition control (absolute value); # Maximum number of active columns in the SP region's output (when # there are more, the weaker ones are suppressed) 'numActivePerInhArea': 40, 'seed': 1956, # coincInputPoolPct # What percent of the columns's receptive field is available # for potential synapses. At initialization time, we will # choose coincInputPoolPct * (2*coincInputRadius+1)^2 'coincInputPoolPct': 0.5, # The default connected threshold. Any synapse whose # permanence value is above the connected threshold is # a "connected synapse", meaning it can contribute to the # cell's firing. Typical value is 0.10. Cells whose activity # level before inhibition falls below minDutyCycleBeforeInh # will have their own internal synPermConnectedCell # threshold set below this default value. # (This concept applies to both SP and TP and so 'cells' # is correct here as opposed to 'columns') 'synPermConnected': 0.1, 'synPermActiveInc': 0.1, 'synPermInactiveDec': 0.01, }, # Controls whether TP is enabled or disabled; # TP is necessary for making temporal predictions, such as predicting # the next inputs. Without TP, the model is only capable of # reconstructing missing sensor inputs (via SP). 'tpEnable' : True, 'tpParams': { # TP diagnostic output verbosity control; # 0: silent; [1..6]: increasing levels of verbosity # (see verbosity in nta/trunk/py/nupic/research/TP.py and TP10X*.py) 'verbosity': 0, # Number of cell columns in the cortical region (same number for # SP and TP) # (see also tpNCellsPerCol) 'columnCount': 2048, # The number of cells (i.e., states), allocated per column. 'cellsPerColumn': 32, 'inputWidth': 2048, 'seed': 1960, # Temporal Pooler implementation selector (see _getTPClass in # CLARegion.py). 'temporalImp': 'cpp', # New Synapse formation count # NOTE: If None, use spNumActivePerInhArea # # TODO: need better explanation 'newSynapseCount': 20, # Maximum number of synapses per segment # > 0 for fixed-size CLA # -1 for non-fixed-size CLA # # TODO: for Ron: once the appropriate value is placed in TP # constructor, see if we should eliminate this parameter from # description.py. 'maxSynapsesPerSegment': 32, # Maximum number of segments per cell # > 0 for fixed-size CLA # -1 for non-fixed-size CLA # # TODO: for Ron: once the appropriate value is placed in TP # constructor, see if we should eliminate this parameter from # description.py. 'maxSegmentsPerCell': 128, # Initial Permanence # TODO: need better explanation 'initialPerm': 0.21, # Permanence Increment 'permanenceInc': 0.1, # Permanence Decrement # If set to None, will automatically default to tpPermanenceInc # value. 'permanenceDec' : 0.1, 'globalDecay': 0.0, 'maxAge': 0, # Minimum number of active synapses for a segment to be considered # during search for the best-matching segments. # None=use default # Replaces: tpMinThreshold 'minThreshold': 12, # Segment activation threshold. # A segment is active if it has >= tpSegmentActivationThreshold # connected synapses that are active due to infActiveState # None=use default # Replaces: tpActivationThreshold 'activationThreshold': 16, 'outputType': 'normal', # "Pay Attention Mode" length. This tells the TP how many new # elements to append to the end of a learned sequence at a time. # Smaller values are better for datasets with short sequences, # higher values are better for datasets with long sequences. 'pamLength': 1, }, 'clParams': { 'regionName' : 'CLAClassifierRegion', # Classifier diagnostic output verbosity control; # 0: silent; [1..6]: increasing levels of verbosity 'clVerbosity' : 0, # This controls how fast the classifier learns/forgets. Higher values # make it adapt faster and forget older patterns faster. 'alpha': 0.0001, # This is set after the call to updateConfigFromSubConfig and is # computed from the aggregationInfo and predictAheadTime. 'steps': '1,5', }, 'trainSPNetOnlyIfRequested': False, }, }

sensorParams spParams (spatial pooling) tpParams (temporal pooling) clParams (cortical learning)

'spParams' { ... 'encoders': { 'consumption': { 'clipInput': True, 'fieldname':

u'consumption', 'n': 100, 'name': u'consumption', 'type': 'AdaptiveScalarEncoder', 'w': 21 }, 'timestamp_dayOfWeek': { 'dayOfWeek': (21, 1), 'fieldname': u'timestamp', 'name': u'timestamp_dayOfWeek', 'type': 'DateEncoder' }, 'timestamp_timeOfDay': { 'fieldname': u'timestamp', 'name': u'timestamp_timeOfDay', 'timeOfDay': (21, 1), 'type': 'DateEncoder' }, 'timestamp_weekend': { 'fieldname': u'timestamp', 'name': u'timestamp_weekend', 'type': 'DateEncoder', 'weekend': 21} }, ... }

'spParams' { ... 'encoders': { 'consumption': { 'clipInput': True, 'fieldname':

As you can see, lots of knobs

NuPIC treat: swarming

{ "includedFields": [ { "fieldName": "timestamp", "fieldType": "datetime" }, {

"fieldName": "consumption", "fieldType": "float"} ], "streamDef": { "info": "test", "version": 1, "streams": [ { "info": "hotGym.csv", "source": "file://extra/hotgym/hotgym.csv", "columns": [ "*" ], "last_record": 100 } ], "aggregation": { "years": 0, "months": 0, "weeks": 0, "days": 0, "hours": 1, "minutes": 0, "seconds": 0, "microseconds": 0, "milliseconds": 0, "fields": [ [ "consumption", "sum" ], [ "gym", "first" ], [ "timestamp", "first" ] ], } }, "inferenceType": "MultiStep", "inferenceArgs": { "predictionSteps": [ 1 ], "predictedField": "consumption" }, "iterationCount": -1, "swarmSize": "medium" }

Tim McNamara: A look at NuPIC - A self-learning...

Tim McNamara: A look at NuPIC - A self-learning AI engine

More Decks by New Zealand Python User Group

Other Decks in Programming

Featured

Transcript