this thesis? • The SNSD system – Community detection – Interest hierarchy • Implementation – Preprocessing – Celery task queue • Experiments • Conclusions and future works
1. Posted contents (plurks) from users 2. Aggregated interest information from communities for the private users • Have to prepare – Relationships – Plurks
number within communities) • Idea: – dense internal connections between the nodes within modules – sparse connections between different modules • Work as a measurement for the quality of partitions and an objective function to optimize.
, 𝑗𝑗 – = the weight of the edge between and – = degree of vertex – = 1 2 ∑ 𝑗𝑗 , number of edges of the graph – , = � 1, = 0, 𝑜𝑜 – is the community of vertex
based on modularity optimization • Louvain algorithm consists of two phases 1. Look for small communities by optimizing modularity locally 2. Aggregate vertices in the same community and build a new network whose vertices are the communities 3. Repeat until a maximum of modularity is attained
for identification of the plurk • owner – The owner/poster of this plurk • content – The formatted and filtered content, e.g. URL will be turned into text tags and emoticons will be filtered etc. • content_raw – The raw content as user entered it • posted – The date this plurk was posted in ISODate format
freq. interest keywords • private: regard the plurker as private, derive his interest keywords by communities and get top-64 freq. interest keywords • len(intersect(public, private))
to find interesting topics and relationship • Develop a new scalable crawling framework based on ZeroMQ • Patch the plurk-oauth library • Build a website for visualizing interest and relationship by D3.js
by users • Apply the SNSD system to Twitter for western language and Sina weibo for mainland China • Employ other community dectection algorithm and optimize NetworkX