• UC Berkeley • University of Wisconsin • University of Florida • Industry backing from Pivotal / EMC • Aims to become ‘CRAN’ for databases • Currently runs on Greenplum DB / PostgreSQL • Provides scalable analytics in SQL DBMS Thursday, March 6, 14
in the database • MADlib implements machine learning algorithms in SQL • python drivers for complex tasks that require multiple iterations on data - MCMC, Gradient Descent • Calls optimized C/C++ Linear Algebra libraries for matrix math Thursday, March 6, 14
group • Topic - Big Data Analytics : Scalable machine learning using open-source tools • Where : Pivotal Labs, SF • When : 3.4.14 - 6.30pm Thursday, March 6, 14
and window aggregates • Common use case : Legacy Data Warehouse with no flexibility • supports statistical text analysis - feature extraction, string matching (n-grams), Viterbi and MCMC inference Thursday, March 6, 14