Scala @ Bizo

Scala @ (scala compose bizo) apply { ftw_! }

Who is this talk for? You are considering or in
the process of adopting Scala in your organization You want to hear about other people's experience adopting Scala You want to learn about companies using Scala to build cool software solutions

Who am I? Alex Boisvert (twitter: @boia01) Software engineer/architect working
at Bizo Back-end kinda guy, interested in system scalability • High-volume transaction-processing • Data retrieval and big data analytics

About Bizo Online ad targeting & analytics platform • Reach
80 million business professionals online • Help better understand composition of web audience • Site personalization, action tracking, custom audiences, funnel analysis, … • And more! Offer many services through web APIs. Classify web visitors into “business demographics” • 150+ industries (agriculture, construction, health care, government, ...) • 100+ functional areas (finance, engineering, legal, sales, ...) • Company size (small, medium, large, F500, …) • Seniority (non-management, mid-management, executive, …) • … location, education, gender, and more.

Engineering Team Team of 8 “dev-ops” engineers Use mix of
Scala, Ruby, Java, Javascript, … All infrastructure running on Amazon Web Services such as EC2, Elastic Map-Reduce (Hadoop), etc. Started using Scala late in 2009. Scala now used almost everywhere (analytics backend, web APIs, scripts, etc.) except, • Large legacy Java components (will take some time) • Web codebase using Google Web Toolkit (GWT) • Smallish / prototyping-size web apps (Ruby + Rails)

TL;DR Scala as scalable language A+ (awesome!) Scala → Java
Interoperability A (just as advertised) Java → Scala Interoperability B- (unintended consequence) Binary Compatibility C (you will have to deal with it) IDE Support C+ (trending towards B) Standard Library A- (not 100% bug-free)

Success Story – Scala + Big Data Handle between 2-3
billion web requests per month Data growing at 400% pace year-over-year Use Hadoop + Hive to aggregate web traffic Built “Sugarcube” – a NoSQL analytics database (OLAP) • 100% Scala code + some Java libraries – 20K LoC • Distributed, cloud-friendly (AWS), scale-out architecture • Multi-dimensional indexing of billions of rows with 12+ dimensions • Response time: < 100 ms (for typical queries) • 6 man-months from paper to production (2 prototype iterations) • Server cluster: 4 x m1.large EC2 instances • Indexing cluster: 2 x m2.2xlarge EC2 instances (2-3 hours/day)

Why Scala? Productivity • Succinct code clearly expresses algorithms •
“Systems Programming” without the hassle Concurrency • Excellent JVM primitives • Easy to build best-fitting abstractions • Immutable data structures Performance • 15X faster than equivalent Ruby prototype

Scala → Java Interoperability “It just works”

Scala → Java Interoperability Deploy webapps on Tomcat/Jetty • Scala
is “just another jar” • No special handling compared to 100% Java apps Mix in many Java libraries • Jersey (RESTful web services / JAX-RS) • Apache Common-*, HTTPClient, Log4J/SLF4J, etc. • Thrift, Spring, … and lots more. Still using Ant + Ivy to build most projects!

Ivy dependencies <ivy-module version="1.1"> <info organisation="bizo.com" module="api-web" /> <dependencies defaultconf="default->default;sources->sources()">
 <dependency org="bizo.com" name="account-management-remote-api" rev="2-20110506191607" /> <dependency org="bizo.com" name="api-account-management" rev="1.2-dev-20110518205306" /> <dependency org="simplistic" name="simplistic" rev="1.0.10b" /> <dependency org="bizo.com" name="api-web-usage-sdb" rev="1.1-dev-20110518205924" /> <dependency org="bizo.com" name="asperatus" rev="1.1-dev-20100923180448" /> <dependency org="bizo.com" name="s3-app-logger" rev="1.1-dev-20110519175618" /> <dependency org="bizo.com" name="datalog-tools" rev="1.1-dev-20101103222807" /> <dependency org="bizo.com" name="datalog-manager" rev="3.1-dev-20110512225804" /> <dependency org="bizo.com" name="bizographer-client" rev="1.1-dev-20110303190156" /> <dependency org="bizo.com" name="thrift-bizographer" rev="1.1-dev-20110125190603" /> <dependency org="bizo.com" name="thrift-dimension-model" rev="1.1-dev-20100804051620" /> <dependency org="apache" name="commons-fileupload" rev="1.2.1" /> <dependency org="apache" name="commons-io" rev="1.4" /> <dependency org="twitter.com" name="scala-json_2.8.0" rev="1.1.2" /> <dependency org="opencsv" name="opencsv" rev="1.8" /> <dependency org="sun" name="jersey" rev="1.5" /> <dependency org="sun" name="jersey-contrib" rev="1.5"> <artifact name="oauth-server" /> <artifact name="oauth-signature" /> </dependency> <dependency org="scala" name="scala" rev="2.8.1.final" conf="default->default;sources->sources;buildtime->compiler"/> <dependency org="ehcache" name="ehcache" rev="1.6.0-beta5" />  <dependency org="sun" name="servlet-api" rev="2.5" conf="buildtime->default" /> <dependency org="findbugs" name="findbugs" rev="1.3.9" conf="buildtime->default" /> <dependency org="cobertura" name="cobertura" rev="1.9.3" conf="cobertura->default" /> <dependency org="svntask" name="svntask" rev="1.0.7" conf="buildtime->default" />  <dependency org="bizo.com" name="jtty" rev="1.1" conf="test->default" /> <dependency org="junit" name="junit" rev="4.7" conf="test->default;sources->sources" /> <dependency org="scalatest" name="scalatest" rev="1.0" conf="test->default" /> <dependency org="mockito" name="mockito" rev="1.8.2" conf="test->default,sources"/> <dependency org="bizo.com" name="fakesdb" rev="2.2" conf="test->servlet" /> <dependency org="oauth" name="oauth-signpost" rev="1.2" conf="test->default"> <artifact name="signpost-core" type="jar" /> </dependency> </dependencies> </ivy-module>

Example – Jersey Annotations package com.bizo.api.web.controllers import javax.ws.rs.{GET, Path, PathParam,
QueryParam, Produces} import javax.ws.rs.core.Response import com.bizo.api.web.model.{Result, TaxonomyResult} import com.bizo.util.BizographicNamingFactory.allByDimension @Path("/v1/taxonomy.{format}") class Taxonomy { @GET @Produces(Array("application/json", "application/xml", "text/csv")) def doGet(@QueryParam("callback") callback: String) = { new Result(Taxonomy.current, callback) } } object Taxonomy { val current = new TaxonomyResult("20100809", allByDimension) }

Java → Scala Interoperability [ ... not a design goal
]

Java 1nt@r0p: The Ugly scala.Option<String> x = scala.None$.MODULE$; Map<String, String>
m = new HashMap(); m.$plus(new Tuple2<String, String>("foo", "bar")); // don't try this at home, kids m.map(new Function1<Tuple2<String, String>, Tuple2<String, String>>() { // ... }, HashMap.$MODULE.<Tuple2<String, String>, Tuple2<String, String>>canBuildFrom());

Data-Access Layer OLAP database Web Grossly Simplified Architecture Java +
GWT Java Scala [ Thrift ]

Data-Access Layer OLAP Database Web Oh Noes! Testing in-VM!! Java
+ GWT Java Scala @#$@!% a.k.a. “leaky abstraction”

trait IndexingContext { val database: String val cube: String val
dimensions: IndexedSeq[(String, String)] val measures: IndexedSeq[(String, String)] val aggregates: IndexedSeq[String] val hierarchicalLevels: IndexedSeq[Level] … }

/** Java-friendly builder */ class IndexingContextBuilder { private val dimensions
= new ArrayBuffer[(String, String)] private val measures = new ArrayBuffer[(String, String)] def addDimension(name: String, dataType: String) { dimensions += (name, dataType) } def addMeasure(name: String, dataType: String) { measures += (name, dataType) } ... def toContext(): IndexingContext = new IndexingContext { override val dimensions = _dimensions override val measures = _measures ... } }

/** * A simpler Sugarcube class. * * Exists for
the sole purpose of making Java testing easier * since dealing with abstract Scala classes with traits * in Java is hell. */ class SimpleSugarcube(val cubes: Map[String, PartitionedCube]) extends Sugarcube { import scala.collection.JavaConversions._ def this(cubes: java.util.Map[String, PartitionedCube]) = this(cubes.toMap) protected def cube(database: String, name: String) = cubes(name) }

Poor Man's Parallel Collections Scalable Language Example #1

trait MapReduce[Input, Output] { val executor: ExecutorService = { /*
default (n processors + 1) thread pool */ } def map(input: Input): Output def reduce(o1: Output, o2: Output): Output final def submit(inputs: Traversable[Input]): Future[Output] = { /* profit !!! */ } } Full code @ https://github.com/aboisvert/scala-samples/tree/master/mapreduce

def expandCuboids() { var level = 1 var expansion =
true // stop when no cuboid expanded within a layer or when // we've expanded everything (worst case) while (expansion && level <= dimensions.size) { val cuboids = for { set <- combinations(dimensions) dims <- combinations(set, level) } yield dims val expansion = new MapReduce[Dimensions, Boolean] { override val executor = indexingExecutor def map(cuboid: Dimensions) = expandCuboid(cuboid) def reduce(expanded1: Boolean, expanded2: Boolean) = expanded1 || expanded2 } submit cuboids level += 1 } }

trait PartitionedCube { … def query( aggregates: Set[Aggregate], conditions: Map[Dimension,
Set[Value]], groupBy: Set[Dimension] ): QueryResult = { new MapReduce[Partition, QueryResult] { override val executor = PartitionedCube.this.executor def map(p: Partition) = p.query(aggregates, conditions) groupBy (groupBy) def reduce(r1: QueryResult, r2: QueryResult) = (r1 merge r2) } submit (partitions filter conditions) } }

trait ParallelForeach { val executor: ExecutorService def foreach[T](xs: Traversable[T])(f: T
=> Unit) = { new MapReduce[T, Unit] { override val executor = ParallelForeach.this.executor override def map(t: T) = f(t) override def reduce(u1: Unit, u2: Unit) = () } submit xs } }

val parallel = new ParallelForeach { val executor = …
} parallel.foreach(files) { f => // download, unzip, etc ... }

Scalable Language Example #2 Simplistic – Idiomatic SimpleDB

import simplistic._ val account = new SimpleDBAccount(key, secret) // list
all domains account.domains.toList // list all items in mydomain account.domain("mydomain").items.toList // create item with single attribute "bar" and value "baz" account.domain("mydomain") item ("foo") += ("bar" -> "baz") // query and print results account.select("select * from mydomain") foreach { e => println(e.name) } Full code @ https://github.com/aboisvert/simplistic

// type-safe attributes import simplistic.Attributes._ import simplistic.Conversions._ object User {
val user = attribute("user") // default to String type val startDate = optionalAttribute("startDate", ISO8601Date) val visits = attribute("visits", PositiveInt) val tags = multiValued("tags") } import User._ users.unique += (user("jack"), startDate(d1), visits(100)) users.unique += (user("jon"), startDate(d2), visits(20)) users.unique += (user("alice"), startDate(d3), visits(15)) users.find(user(“jack”)) += tags(“male”, “frequent”, “premium”)

// type-safe queries import simplistic.Query._ val visitors = for (
item <- users (visits > 1 and visits < 50 sort visits desc)) ) yield user(item) // without for-comprehension val visitors = { users select (visits > 1 and visits < 50 sort visits desc)) } visitors foreach println

// SimpleDB “conditional put” import simplistic.Query._ class Task { …
def updateIfUnassigned(): Boolean = try { setIf(assigned doesNotExist)(updateAttributes) true } catch { case ex: ConditionalCheckFailed => false } }

Scalable Language Example #3 Revolute – Hadoop Query Language (layered
on Cascading)

// Table definitions // // e.g. for data files stored
on Hadoop File System (HDFS) object Persons extends Table[(String, Int)]("persons") { def name = column[String]("name") def age = column[Int]("age") def * = name ~ age } object Follows extends Table[(String, String)]("follows") { def follower = column[String]("follower") def followee = column[String]("followee") def * = follower ~ followee }

/* Predator detection */ for { p1 <- Persons p2
<- Persons where { p2 => p2.age < 16 && p1.age - p2.age > 10 } _ <- Follows where { f => (f.follower is p1.name) && (f.followee is p2.name) } _ <- Query.orderBy { (p1.age - p2.age) desc } } yield p1.name ~ p2.name Full code @ https://github.com/aboisvert/revolute

Standard Library Dealing with Wrinkles

Standard Library Odds are you're going to run into some
bugs :-( • 2.8.1: NoSuchElementException in HashSet (issue with hash-code collisions) • 2.9.0: View.groupBy() broken (StackOverflowException) Scala releases are few and far between … How will you deal with situation? • Avoid the feature • Build your own • Fix it yourself • Buy support from Typesafe We decided to maintain patched version of standard library.

Final Scorecard Scala as scalable language A+ (awesome!) Scala →
Java Interoperability A (just as advertised) Java → Scala Interoperability B- (unintended consequence) Binary Compatibility C (you will have to deal with it) IDE Support C+ (trending towards B) Standard Library A- (not 100% bug-free)

Questions? Twitter: @boia01 Email: [email protected] / [email protected] Pssst! We're hiring!

Scala @ Bizo

Scala @ Bizo

More Decks by boia01

Other Decks in Programming

Featured

Transcript