Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Project October - annotated

Project October - annotated

Read news that you actually want to read

Raja Cherukuri

January 23, 2013
Tweet

More Decks by Raja Cherukuri

Other Decks in Technology

Transcript

  1. Modern News Aggregators [1] Reddit: http://reddit.com [2] Slashdot: http://slashdot.org [3]

    Digg: http://digg.com [4] Hacker News: http://news.ycombinator.com [5] Stackoverflow: http://stackoverflow.com
  2. What is Project October? Use technological principles to avoid[1]: loss

    of longtime members large exodus of excellent contributors influx of malicious contributors Improve the user experience Allow discourse and interesting articles from the community [1] Eternal September: http://www.nyupress.org/netwars/pages/chapter03/ch03_.html
  3. Project Scope Split between Frontend and Backend Communicate via API

    Frontend: Features common objects and actions found on sites like Reddit User submitted articles or other media Backend: Hybrid recommendation engine[1] [2] [3] [4] [5] Search Engine[6][7][8] [1] Hybrid tag recommendation for social annotation systems: http://doi.acm.org/10.1145/1871437.1871543 [2] A hybrid video recommendation system using a graph-based algorithm: http://dl.acm.org/citation.cfm?id=2025816.2025858 [3] Learning to rank for hybrid recommendation: http://doi.acm.org/10.1145/2396761.2398610 [4] Eigentrust: http://doi.acm.org/10.1145/1120717.1120721 [5] Cassandra: a decentralized structured storage system: http://doi.acm.org/10.1145/1773912.1773922 [6] S. K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey, data mining and knowledge discovery. KDD Journal, 2(4), 345-389, 1998. [7] C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2), 121-168, 1998. [8] B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. KDD, 1998.
  4. Technical Details Frontend: Ruby on Rails[15] Backend: Groovy[14] Titan[1][2][4][10][11][12][13] Tinkerpop[3][5][6][7][8][9]

    [1] Titan: http://github.com/thinkaurelius/titan [2] Cassandra: http://cassandra.apache.org/ [3] Tinkerpop: http://www.tinkerpop.com/ [4] Titan: Big Graph Data with Cassandra: http://www.slideshare.net/knowfrominfo/titan-big-graph-data-with-cassandra [5] P. Berkhin. Survey of clustering data mining techniques, 2002. [6] R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. VLDB, 144-155, 1994. [7] T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH : an efficient data clustering method for very large databases. SIGMOD, 103-114, 1996. [8] S. Guha, R. Rastogi, and K. Shim. Cure: an efficient clustering algorithm for large databases. SIGMOD, 73-84, 1998. [9] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. KDD, 226-231, 1996. [10] W. Wang, J. Yang, and R. Muntz. STING: a statistical information grid approach to spatial data mining. VLDB, 186-195, 1997. [11] Peter Haider, Luca Chiarandini: Discriminative Clustering for Market Segmentation. KDD 2012. [12] Jie Tang, Sen Wu, Jimeng Sun, Hang Su: Cross-domain Collaboration Recommendation. KDD 2012. [13] Ming Ji, Jiawei Han, Marina Danilevsky: Ranking-based classification of heterogeneous information networks. KDD 2011 [14] Groovy: http://groovy.codehaus.org [15] Ruby on Rails: http://rubyonrails.org
  5. Methodology Multi-Phase Agile Development[1] 1-week iterations ending with a release[2]

    Documentation with each task, aggregated at release time Pivotal Tracker: http://pivotaltracker.com/projects/734155 Source Control[3] Frontend: https://github.com/ted27/project-october Backend: https://github.com/rxc178/project-october-backend [1] Scaling Lean & Agile Development: http://www.amazon.com/Scaling-Lean-Agile-Development-Organizational/dp/0321480961 [2] Progressive Elaboration: http://pmi.org [3] Git: http://git-scm.org