traditional tools to join different data sources and prepare a holistic dataset This dataset can be automatically processed using topological data analysis and presented as map of dependencies and correlations The motivation = Get answers to questions you didn’t ask yet
same object to homeomorphic spaces, that is: Homology: is a machine that converts local data about a space into global algebraic structure Reference: Wikipedia, 2010. Topological invariants
the structure of the underlying space b. Then compute topological invariants of this structure c. Represent these topological invariants in 2d space Topology Data Analysis Pipeline c
is a discrete Morse function. Then X is homotopy equivalent to a CW-complex with exactly one cell of dimension p for each critical simplex of dimension p. Reference: Teng Ma ; Zhuangzhi Wu ; Pei Luo ; Lu Feng. Reeb graph computa1on through spectral clustering, 2011. Morse Theory and Reeb Graph
activities per day: • Number of likes • Number of dislikes • Number of matches • Profiles visited • Photos uploaded • Number of messages sent (no content analysed) • Number of message replies • Interactions with different app features
identified purely by activity • Found a group of male users, who are acting very similar to females, subject to further analysis • Performed segmentation of users for potential product features with product team • Distilled a group of blocked users • Found several interesting correlations between usage and success on the app
a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. • 18,820 documents • From 6 to 5000 words each • 20 newsgroups (classes) 20 Newsgroups academic dataset (semi-‐supervised)
must be baseball speed game margin realist chip ucdavi edu gari built villanova huckabai basebal game and shade hour that damn long don plai hour game watch game for that long butt fall asleep and watch channel surf pitch catch color
must be motorcycles ride sixteen dai had put test drive honda final saturdai rain fact clear warm and sunni and wind di week ago long cool ride hawk cycl for test ride had sold and deliv demo fifteen hour arriv and demo vfr bike lock showroom surround bike and not like move todai even bike us dirt bike us street bike car and big tent full outlandishli fat tour bike trailer squeez park lot sort fat bike convent shelli and dave run msf each time classroom and back lot usual free cookout distribut severli affect will bike perform such load cling back rest secur shift increas chanc surf collect wisdom request can afford leather pant boot and jean can make you knee protector rollerblad us bean and sell
or word counts for each document 1000 500 500 Labels for selected points Topological fine-tuner Encoder 1 Encoder 2 Fine-tuning weights Fine-tuning weights f θ (1) f θ (2) x
Theory and Persistent Homology (Kevin P. Knudson): http://www.math.fsu.edu/~hironaka/FSUUF/knudson.pdf Topological Persistence and Simplification (Herbert Edelsbrunner, David Letscher, Afra Zomorodian): http://math.uchicago.edu/~shmuel/AAT-readings/Data%20Analysis%20/PersTop.pdf Extracting and Composing Robust Features with Denoising Autoencoders (Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre-Antoine Manzagol) http://www.iro.umontreal.ca/~vincentp/Publications/ denoising_autoencoders_tr1316.pdf