Scala @ Bizo - Speaker Deck

Slide 1

Slide 1 text

Scala @ (scala compose bizo) apply { ftw_! }

Slide 2

Slide 2 text

Who is this talk for? You are considering or in the process of adopting Scala in your organization You want to hear about other people's experience adopting Scala You want to learn about companies using Scala to build cool software solutions

Slide 3

Slide 3 text

Who am I? Alex Boisvert (twitter: @boia01) Software engineer/architect working at Bizo Back-end kinda guy, interested in system scalability ● High-volume transaction-processing ● Data retrieval and big data analytics

Slide 4

Slide 4 text

About Bizo Online ad targeting & analytics platform ● Reach 80 million business professionals online ● Help better understand composition of web audience ● Site personalization, action tracking, custom audiences, funnel analysis, … ● And more! Offer many services through web APIs. Classify web visitors into “business demographics” ● 150+ industries (agriculture, construction, health care, government, ...) ● 100+ functional areas (finance, engineering, legal, sales, ...) ● Company size (small, medium, large, F500, …) ● Seniority (non-management, mid-management, executive, …) ● … location, education, gender, and more.

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Engineering Team Team of 8 “dev-ops” engineers Use mix of Scala, Ruby, Java, Javascript, … All infrastructure running on Amazon Web Services such as EC2, Elastic Map-Reduce (Hadoop), etc. Started using Scala late in 2009. Scala now used almost everywhere (analytics backend, web APIs, scripts, etc.) except, ● Large legacy Java components (will take some time) ● Web codebase using Google Web Toolkit (GWT) ● Smallish / prototyping-size web apps (Ruby + Rails)

Slide 7

Slide 7 text

TL;DR Scala as scalable language A+ (awesome!) Scala → Java Interoperability A (just as advertised) Java → Scala Interoperability B- (unintended consequence) Binary Compatibility C (you will have to deal with it) IDE Support C+ (trending towards B) Standard Library A- (not 100% bug-free)

Slide 8

Slide 8 text

Success Story – Scala + Big Data Handle between 2-3 billion web requests per month Data growing at 400% pace year-over-year Use Hadoop + Hive to aggregate web traffic Built “Sugarcube” – a NoSQL analytics database (OLAP) ● 100% Scala code + some Java libraries – 20K LoC ● Distributed, cloud-friendly (AWS), scale-out architecture ● Multi-dimensional indexing of billions of rows with 12+ dimensions ● Response time: < 100 ms (for typical queries) ● 6 man-months from paper to production (2 prototype iterations) ● Server cluster: 4 x m1.large EC2 instances ● Indexing cluster: 2 x m2.2xlarge EC2 instances (2-3 hours/day)

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Why Scala? Productivity ● Succinct code clearly expresses algorithms ● “Systems Programming” without the hassle Concurrency ● Excellent JVM primitives ● Easy to build best-fitting abstractions ● Immutable data structures Performance ● 15X faster than equivalent Ruby prototype

Slide 12

Slide 12 text

Scala → Java Interoperability “It just works”

Slide 13

Slide 13 text

Scala → Java Interoperability Deploy webapps on Tomcat/Jetty ● Scala is “just another jar” ● No special handling compared to 100% Java apps Mix in many Java libraries ● Jersey (RESTful web services / JAX-RS) ● Apache Common-*, HTTPClient, Log4J/SLF4J, etc. ● Thrift, Spring, … and lots more. Still using Ant + Ivy to build most projects!

Slide 14

Slide 14 text

Ivy dependencies

Slide 15

Slide 15 text

Example – Jersey Annotations package com.bizo.api.web.controllers import javax.ws.rs.{GET, Path, PathParam, QueryParam, Produces} import javax.ws.rs.core.Response import com.bizo.api.web.model.{Result, TaxonomyResult} import com.bizo.util.BizographicNamingFactory.allByDimension @Path("/v1/taxonomy.{format}") class Taxonomy { @GET @Produces(Array("application/json", "application/xml", "text/csv")) def doGet(@QueryParam("callback") callback: String) = { new Result(Taxonomy.current, callback) } } object Taxonomy { val current = new TaxonomyResult("20100809", allByDimension) }

Slide 16

Slide 16 text

Java → Scala Interoperability [ ... not a design goal ]

Slide 17

Slide 17 text

Java 1nt@r0p: The Ugly scala.Option x = scala.None$.MODULE$; Map m = new HashMap(); m.$plus(new Tuple2("foo", "bar")); // don't try this at home, kids m.map(new Function1, Tuple2>() { // ... }, HashMap.$MODULE., Tuple2>canBuildFrom());

Slide 18

Slide 18 text

Data-Access Layer OLAP database Web Grossly Simplified Architecture Java + GWT Java Scala [ Thrift ]

Slide 19

Slide 19 text

Data-Access Layer OLAP Database Web Oh Noes! Testing in-VM!! Java + GWT Java Scala @#$@!% a.k.a. “leaky abstraction”

Slide 20

Slide 20 text

trait IndexingContext { val database: String val cube: String val dimensions: IndexedSeq[(String, String)] val measures: IndexedSeq[(String, String)] val aggregates: IndexedSeq[String] val hierarchicalLevels: IndexedSeq[Level] … }

Slide 21

Slide 21 text

/** Java-friendly builder */ class IndexingContextBuilder { private val dimensions = new ArrayBuffer[(String, String)] private val measures = new ArrayBuffer[(String, String)] def addDimension(name: String, dataType: String) { dimensions += (name, dataType) } def addMeasure(name: String, dataType: String) { measures += (name, dataType) } ... def toContext(): IndexingContext = new IndexingContext { override val dimensions = _dimensions override val measures = _measures ... } }

Slide 22

Slide 22 text

/** * A simpler Sugarcube class. * * Exists for the sole purpose of making Java testing easier * since dealing with abstract Scala classes with traits * in Java is hell. */ class SimpleSugarcube(val cubes: Map[String, PartitionedCube]) extends Sugarcube { import scala.collection.JavaConversions._ def this(cubes: java.util.Map[String, PartitionedCube]) = this(cubes.toMap) protected def cube(database: String, name: String) = cubes(name) }

Slide 23

Slide 23 text

Poor Man's Parallel Collections Scalable Language Example #1

Slide 24

Slide 24 text

trait MapReduce[Input, Output] { val executor: ExecutorService = { /* default (n processors + 1) thread pool */ } def map(input: Input): Output def reduce(o1: Output, o2: Output): Output final def submit(inputs: Traversable[Input]): Future[Output] = { /* profit !!! */ } } Full code @ https://github.com/aboisvert/scala-samples/tree/master/mapreduce

Slide 25

Slide 25 text

def expandCuboids() { var level = 1 var expansion = true // stop when no cuboid expanded within a layer or when // we've expanded everything (worst case) while (expansion && level <= dimensions.size) { val cuboids = for { set <- combinations(dimensions) dims <- combinations(set, level) } yield dims val expansion = new MapReduce[Dimensions, Boolean] { override val executor = indexingExecutor def map(cuboid: Dimensions) = expandCuboid(cuboid) def reduce(expanded1: Boolean, expanded2: Boolean) = expanded1 || expanded2 } submit cuboids level += 1 } }

Slide 26

Slide 26 text

trait PartitionedCube { … def query( aggregates: Set[Aggregate], conditions: Map[Dimension, Set[Value]], groupBy: Set[Dimension] ): QueryResult = { new MapReduce[Partition, QueryResult] { override val executor = PartitionedCube.this.executor def map(p: Partition) = p.query(aggregates, conditions) groupBy (groupBy) def reduce(r1: QueryResult, r2: QueryResult) = (r1 merge r2) } submit (partitions filter conditions) } }

Slide 27

Slide 27 text

trait ParallelForeach { val executor: ExecutorService def foreach[T](xs: Traversable[T])(f: T => Unit) = { new MapReduce[T, Unit] { override val executor = ParallelForeach.this.executor override def map(t: T) = f(t) override def reduce(u1: Unit, u2: Unit) = () } submit xs } }

Slide 28

Slide 28 text

val parallel = new ParallelForeach { val executor = … } parallel.foreach(files) { f => // download, unzip, etc ... }

Slide 29

Slide 29 text

Scalable Language Example #2 Simplistic – Idiomatic SimpleDB

Slide 30

Slide 30 text

import simplistic._ val account = new SimpleDBAccount(key, secret) // list all domains account.domains.toList // list all items in mydomain account.domain("mydomain").items.toList // create item with single attribute "bar" and value "baz" account.domain("mydomain") item ("foo") += ("bar" -> "baz") // query and print results account.select("select * from mydomain") foreach { e => println(e.name) } Full code @ https://github.com/aboisvert/simplistic

Slide 31

Slide 31 text

// type-safe attributes import simplistic.Attributes._ import simplistic.Conversions._ object User { val user = attribute("user") // default to String type val startDate = optionalAttribute("startDate", ISO8601Date) val visits = attribute("visits", PositiveInt) val tags = multiValued("tags") } import User._ users.unique += (user("jack"), startDate(d1), visits(100)) users.unique += (user("jon"), startDate(d2), visits(20)) users.unique += (user("alice"), startDate(d3), visits(15)) users.find(user(“jack”)) += tags(“male”, “frequent”, “premium”)

Slide 32

Slide 32 text

// type-safe queries import simplistic.Query._ val visitors = for ( item <- users (visits > 1 and visits < 50 sort visits desc)) ) yield user(item) // without for-comprehension val visitors = { users select (visits > 1 and visits < 50 sort visits desc)) } visitors foreach println

Slide 33

Slide 33 text

// SimpleDB “conditional put” import simplistic.Query._ class Task { … def updateIfUnassigned(): Boolean = try { setIf(assigned doesNotExist)(updateAttributes) true } catch { case ex: ConditionalCheckFailed => false } }

Slide 34

Slide 34 text

Scalable Language Example #3 Revolute – Hadoop Query Language (layered on Cascading)

Slide 35

Slide 35 text

// Table definitions // // e.g. for data files stored on Hadoop File System (HDFS) object Persons extends Table[(String, Int)]("persons") { def name = column[String]("name") def age = column[Int]("age") def * = name ~ age } object Follows extends Table[(String, String)]("follows") { def follower = column[String]("follower") def followee = column[String]("followee") def * = follower ~ followee }

Slide 36

Slide 36 text

/* Predator detection */ for { p1 <- Persons p2 <- Persons where { p2 => p2.age < 16 && p1.age - p2.age > 10 } _ <- Follows where { f => (f.follower is p1.name) && (f.followee is p2.name) } _ <- Query.orderBy { (p1.age - p2.age) desc } } yield p1.name ~ p2.name Full code @ https://github.com/aboisvert/revolute

Slide 37

Slide 37 text

Standard Library Dealing with Wrinkles

Slide 38

Slide 38 text

Standard Library Odds are you're going to run into some bugs :-( ● 2.8.1: NoSuchElementException in HashSet (issue with hash-code collisions) ● 2.9.0: View.groupBy() broken (StackOverflowException) Scala releases are few and far between … How will you deal with situation? ● Avoid the feature ● Build your own ● Fix it yourself ● Buy support from Typesafe We decided to maintain patched version of standard library.

Slide 39

Slide 39 text

Final Scorecard Scala as scalable language A+ (awesome!) Scala → Java Interoperability A (just as advertised) Java → Scala Interoperability B- (unintended consequence) Binary Compatibility C (you will have to deal with it) IDE Support C+ (trending towards B) Standard Library A- (not 100% bug-free)

Slide 40

Slide 40 text

Questions? Twitter: @boia01 Email: [email protected] / [email protected] Pssst! We're hiring!