Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Project October - annotated

Project October - annotated

Read news that you actually want to read

Avatar for Raja Cherukuri

Raja Cherukuri

January 23, 2013
Tweet

More Decks by Raja Cherukuri

Other Decks in Technology

Transcript

  1. Modern News Aggregators [1] Reddit: http://reddit.com [2] Slashdot: http://slashdot.org [3]

    Digg: http://digg.com [4] Hacker News: http://news.ycombinator.com [5] Stackoverflow: http://stackoverflow.com
  2. What is Project October? Use technological principles to avoid[1]: loss

    of longtime members large exodus of excellent contributors influx of malicious contributors Improve the user experience Allow discourse and interesting articles from the community [1] Eternal September: http://www.nyupress.org/netwars/pages/chapter03/ch03_.html
  3. Project Scope Split between Frontend and Backend Communicate via API

    Frontend: Features common objects and actions found on sites like Reddit User submitted articles or other media Backend: Hybrid recommendation engine[1] [2] [3] [4] [5] Search Engine[6][7][8] [1] Hybrid tag recommendation for social annotation systems: http://doi.acm.org/10.1145/1871437.1871543 [2] A hybrid video recommendation system using a graph-based algorithm: http://dl.acm.org/citation.cfm?id=2025816.2025858 [3] Learning to rank for hybrid recommendation: http://doi.acm.org/10.1145/2396761.2398610 [4] Eigentrust: http://doi.acm.org/10.1145/1120717.1120721 [5] Cassandra: a decentralized structured storage system: http://doi.acm.org/10.1145/1773912.1773922 [6] S. K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey, data mining and knowledge discovery. KDD Journal, 2(4), 345-389, 1998. [7] C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2), 121-168, 1998. [8] B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. KDD, 1998.
  4. Technical Details Frontend: Ruby on Rails[15] Backend: Groovy[14] Titan[1][2][4][10][11][12][13] Tinkerpop[3][5][6][7][8][9]

    [1] Titan: http://github.com/thinkaurelius/titan [2] Cassandra: http://cassandra.apache.org/ [3] Tinkerpop: http://www.tinkerpop.com/ [4] Titan: Big Graph Data with Cassandra: http://www.slideshare.net/knowfrominfo/titan-big-graph-data-with-cassandra [5] P. Berkhin. Survey of clustering data mining techniques, 2002. [6] R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. VLDB, 144-155, 1994. [7] T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH : an efficient data clustering method for very large databases. SIGMOD, 103-114, 1996. [8] S. Guha, R. Rastogi, and K. Shim. Cure: an efficient clustering algorithm for large databases. SIGMOD, 73-84, 1998. [9] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. KDD, 226-231, 1996. [10] W. Wang, J. Yang, and R. Muntz. STING: a statistical information grid approach to spatial data mining. VLDB, 186-195, 1997. [11] Peter Haider, Luca Chiarandini: Discriminative Clustering for Market Segmentation. KDD 2012. [12] Jie Tang, Sen Wu, Jimeng Sun, Hang Su: Cross-domain Collaboration Recommendation. KDD 2012. [13] Ming Ji, Jiawei Han, Marina Danilevsky: Ranking-based classification of heterogeneous information networks. KDD 2011 [14] Groovy: http://groovy.codehaus.org [15] Ruby on Rails: http://rubyonrails.org
  5. Methodology Multi-Phase Agile Development[1] 1-week iterations ending with a release[2]

    Documentation with each task, aggregated at release time Pivotal Tracker: http://pivotaltracker.com/projects/734155 Source Control[3] Frontend: https://github.com/ted27/project-october Backend: https://github.com/rxc178/project-october-backend [1] Scaling Lean & Agile Development: http://www.amazon.com/Scaling-Lean-Agile-Development-Organizational/dp/0321480961 [2] Progressive Elaboration: http://pmi.org [3] Git: http://git-scm.org