Computing Lecture 05: Perception in Practice Dr. Javier Gonzalez-Sanchez [email protected] www.javiergs.com Building 14 -227 Office Hours: By appointment
5 As of today, you have done the following: 1. Created an Emotiv Account 2. Installed Emotiv Software (Emotiv PRO) 3. Reviewed the Cortex API https://emotiv.gitbook.io/cortex-api/ Checklist Right?
7 EEG signals from 20 healthy skilled volunteers. Each volunteer was asked to repeat an experiment (triggered by a visual stimulus) 10 times at different frequencies (7Hz optical stimuli, 9Hz, 11Hz, 13Hz, and resting task). § 20 healthy skilled volunteers § 5 stimuli ( 4 plus a resting case) § ~145 seconds for each stimulated task § 128 samples Particular interest in O1 and O2 https://ieee-dataport.org/documents/ssvep-eeg-data-collection-using-emotiv-epoc Dataset 1
10 § All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement and added later manually to the file after analyzing the video frames. '1' indicates the eye is closed, and '0’ is the eye-open state. All values are in chronological order, with the first measured value at the top of the data. https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State Dataset 2
15 § Unsupervised Learning § Clustering is the task of dividing a population (data points) into a number of groups such that data points in the same groups are similar Definition
16 § K-Means - distance between points. Minimize square-error criterion. § DBSCAN (Density-Based Spatial Clustering of Applications with Noise) - distance between nearest points. § Simple EM (Expectation Maximization) is finding likelihood of an observation belonging to a cluster(probability). Maximize log- likelihood criterion Algorithms
19 § One of the simplest ways to calculate the distance between two feature vectors is to use Euclidean Distance. § Other options: Minkowski distance, Manhattan distance, Hamming distance, Cosine distance, … Similarity
20 § K-Means begins with k randomly placed centroids. Centroids are the center points of the clusters. § Iteration: o Assign each existing data point to its nearest centroid o Move the centroids to the average location of points assigned to it. § Repeat iterations until the assignment between multiple consecutive iterations stops changing Algorithm: K-means
21 § K-Means clustering may cluster loosely related observations together. Every observation becomes a part of some cluster eventually, even if the observations are scattered far away in the vector space § Clusters depend on the mean value of cluster elements; each data point plays a role in forming the clusters. A slight change in data points might affect the clustering outcome. § Another challenge with k-means is that you need to specify the number of clusters (“k”) in order to use it. Much of the time, we won’t know what a reasonable k value is a priori. K-means Problems
26 § The algorithm proceeds by arbitrarily picking up a point in the dataset § If there are at least N points within a radius of E to the point, then we consider all these points to be part of the same cluster. § Repeat until all points have been visited DBSCAN
27 § weka.clusterers: These are clustering algorithms, including K-means, CLOPE, Cobweb, DBSCAN hierarchical clustering, and FarthestFirst. K-means VS. DBSCAN
30 EM can decide how many clusters to create by cross validation, or you may specify apriori how many clusters to generate. 1. The number of clusters is set to 1. 2. EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. 3. The training set is split randomly into 10 folds. 4. EM is performed 10 times using the 10 folds. 5. The loglikelihood is averaged over all 10 results. 6. If loglikelihood has increased the number of clusters is increased by 1 and the program continues at step 2. The number of folds is fixed to 10, if the number of instances in the training set is not smaller 10. If this is the case the number of folds is set equal to the number of instances. Simple EM (Expectation Maximization)
31 Simple EM (Expectation Maximization) EM Demystified: An Expectation-Maximization Tutorial https://vannevar.ece.uw.edu/techsite/papers/documents/UWEETR-2010-0002.pdf
33 § Waikato Environment for Knowledge Analysis (WEKA) is a machine learning library that was developed at the University of Waikato, New Zealand, § A well-known Java library. § It is a general-purpose library that can solve a wide variety of machine learning tasks, such as classification, regression, and clustering. § It features a rich graphical user interface, command-line interface, and Java API. § http://www.cs.waikato.ac.nz/ml/weka/ Weka
39 Homework • Open or Closed Eyes VS. Brain • 5 diverse stimulation scenarios VS. Brain • 5 diverse stimulation scenarios VS. Affect It is not mandatory to use Clustering. Explore solutions to the best of your knowledge Due: Wednesday
Spring 2023 Copyright. These slides can only be used as study material for the class CSC308 at Cal Poly. They cannot be distributed or used for another purpose.