Slide 1

Slide 1 text

IMPROVING SPARK SQL PERFORMANCE Keiji Yoshida - LINE Corporation

Slide 2

Slide 2 text

OASIS

Slide 3

Slide 3 text

Agenda 1. Issue 2. Gathering Statistics 3. Injecting Custom Optimization Rules 4. Cost-based Optimizer

Slide 4

Slide 4 text

Agenda 1. Issue 2. Gathering Statistics 3. Injecting Custom Optimization Rules 4. Cost-based Optimizer

Slide 5

Slide 5 text

Users / Day 1,300+ Spark Apps / Day 7,000+ Spark Jobs / Day 200+ USAGE

Slide 6

Slide 6 text

SPARK VERSION AND CONFIGURATION • 2.4.0 • spark.sql.cbo.enabled = false

Slide 7

Slide 7 text

SLOW QUERIES Rank Query Count Average Time (min) Total Time (min) 1 Query A 335 20 6,842 2 Query B 28 85 2,380 3 Query C 109 20 2,182 4 Query D 4028 0 1,543 5 Query E 14 105 1,480

Slide 8

Slide 8 text

IMPACTS • Speed of data analysis and decision making • Usage efficiency of computing resources

Slide 9

Slide 9 text

CHALLENGES FOR END USERS • Acquiring knowledge and skills of data engineering • Taking care of queries’ performances • Changing queries and tables in the production environment

Slide 10

Slide 10 text

SOLUTION • Improving queries’ performances on the platform side

Slide 11

Slide 11 text

Agenda 1. Issue 2. Gathering Statistics 3. Injecting Custom Optimization Rules 4. Cost-based Optimizer

Slide 12

Slide 12 text

GATHERING STATISTICS • 1,500 sec → 60 sec

Slide 13

Slide 13 text

QUERY select t.dt, m.name, count(1) from ( select dt, key from db.transaction where dt between '20190211' and '20190217' ) t inner join db.master m on m.key = t.key group by t.dt, m.name

Slide 14

Slide 14 text

QUERY select t.dt, m.name, count(1) from ( select dt, key from db.transaction where dt between '20190211' and '20190217' ) t inner join db.master m on m.key = t.key group by t.dt, m.name 1,800,000,000 records 200 records

Slide 15

Slide 15 text

PHYSICAL PLAN *(6) HashAggregate(keys=[dt#958, name#970], functions=[count(1)]) +- Exchange hashpartitioning(dt#958, name#970, 200) +- *(5) HashAggregate(keys=[dt#958, name#970], functions=[partial_count(1)]) +- *(5) Project [dt#958, name#970] +- *(5) SortMergeJoin [key#937], [key#960], Inner :- *(2) Sort [key#937 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(key#937, 200) : +- *(1) Project [dt#958, key#937] : +- *(1) Filter isnotnull(key#937) : +- *(1) FileScan orc transaction[key#937,dt#958] Batched: true, Format: ORC, Location:… +- *(4) Sort [key#960 ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(key#960, 200) +- *(3) Project [key#960, name#970] +- *(3) Filter isnotnull(key#960) +- *(3) FileScan orc master[key#960,name#970] Batched: true, Format: ORC, Location:…

Slide 16

Slide 16 text

SORT MERGE JOIN

Slide 17

Slide 17 text

BROADCAST HASH JOIN

Slide 18

Slide 18 text

AUTO BROADCASTJOIN THRESHOLD • spark.sql.autoBroadcastJoinThreshold = 10 MB

Slide 19

Slide 19 text

QUERY select t.dt, m.name, count(1) from ( select dt, key from db.transaction where dt between '20190211' and '20190217' ) t inner join db.master m on m.key = t.key group by t.dt, m.name 1,800,000,000 records 200 records

Slide 20

Slide 20 text

PHYSICAL PLAN *(6) HashAggregate(keys=[dt#958, name#970], functions=[count(1)]) +- Exchange hashpartitioning(dt#958, name#970, 200) +- *(5) HashAggregate(keys=[dt#958, name#970], functions=[partial_count(1)]) +- *(5) Project [dt#958, name#970] +- *(5) SortMergeJoin [key#937], [key#960], Inner :- *(2) Sort [key#937 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(key#937, 200) : +- *(1) Project [dt#958, key#937] : +- *(1) Filter isnotnull(key#937) : +- *(1) FileScan orc db.transaction[key#937,dt#958] Batched: true, Format: ORC +- *(4) Sort [key#960 ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(key#960, 200) +- *(3) Project [key#960, name#970] +- *(3) Filter isnotnull(key#960) +- *(3) FileScan orc db.master[key#960,name#970] Batched: true, Format: ORC

Slide 21

Slide 21 text

STATS val q = "select * from db.master” spark.sql(q).queryExecution.optimizedPlan.stats.sizeInBytes → 9223372036854775807

Slide 22

Slide 22 text

OASIS.INSERTOVERWRITE spark.sql(q).repartition(filesNum).write().mode(“overwrite”).insertInto(“db.master”)

Slide 23

Slide 23 text

OASIS.INSERTOVERWRITE spark.sql(q).repartition(filesNum).write().mode(“overwrite”).insertInto(“db.master”) spark.sql(“analyze table db.master compute statistics noscan”)

Slide 24

Slide 24 text

STATS

Slide 25

Slide 25 text

STATS val q = "select * from db.master” spark.sql(q).queryExecution.optimizedPlan.stats.sizeInBytes → 8239

Slide 26

Slide 26 text

PHYSICAL PLAN *(3) HashAggregate(keys=[dt#149, name#161], functions=[count(1)]) +- Exchange hashpartitioning(dt#149, name#161, 200) +- *(2) HashAggregate(keys=[dt#149, name#161], functions=[partial_count(1)]) +- *(2) Project [dt#149, name#161] +- *(2) BroadcastHashJoin [key#128], [key#151], Inner, BuildRight :- *(2) Project [dt#149, key#128] : +- *(2) Filter isnotnull(key#128) : +- *(2) FileScan orc db.transaction[key#128,dt#149] Batched: true, Format:… +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])) +- *(1) Project [key#151, name#161] +- *(1) Filter isnotnull(key#151) +- *(1) FileScan orc db.master[key#151,name#161] Batched: true, Format:…

Slide 27

Slide 27 text

RESULT • 1,500 sec → 60 sec

Slide 28

Slide 28 text

Agenda 1. Issue 2. Gathering Statistics 3. Injecting Custom Optimization Rules 4. Cost-based Optimizer

Slide 29

Slide 29 text

INJECTING CUSTOM OPTIMIZATION RULES • 6,000 sec → 200 sec

Slide 30

Slide 30 text

QUERY select t.dt, a.area_name, count(distinct uid) from ( select dt, area_code, uid from db.transaction where dt between '20180818' and '20190217' ) t inner join (select * from db.area where dt = ‘20190217’) a on a.area_code = t.area_code group by t.dt, a.area_name

Slide 31

Slide 31 text

QUERY select t.dt, a.area_name, count(distinct uid) from ( select dt, area_code, uid from db.transaction where dt between '20180818' and '20190217' ) t inner join (select * from db.area where dt = ‘20190217’) a on a.area_code = t.area_code group by t.dt, a.area_name 4,500,000,000 records 90 records

Slide 32

Slide 32 text

PHYSICAL PLAN +- *(5) HashAggregate(...) +- *(5) HashAggregate(...) +- *(5) HashAggregate(...) +- *(5) Project [...] +- *(5) SortMergeJoin [...], Inner :- *(2) Sort [...], false, 0 : +- Exchange hashpartitioning(...) : +- *(1) Project [...] : +- *(1) Filter (...) : +- *(1) FileScan orc db.transaction[...] Batched: true, Format: ORC +- *(4) Sort [...], false, 0 +- Exchange hashpartitioning(...) +- *(3) Filter (...) +- Scan hive db.area [...], HiveTableRelation

Slide 33

Slide 33 text

HIVE.STATS.AUTOGATHER

Slide 34

Slide 34 text

BROADCAST HINT select /*+ broadcast(r) */ * from records r join src s on r.key = s.key

Slide 35

Slide 35 text

CUSTOMIZE OPTIMIZATION spark.experimental.extraOptimizations = MyOptimization :: Nil

Slide 36

Slide 36 text

CUSTOMIZE OPTIMIZATION object InsertBroadcastHints extends Rule[LogicalPlan] { val targets = Map(("db", "area") -> "dt") override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case join: Join if isSelectFromTarget(join.left) => join.copy(left = ResolvedHint(join.left, HintInfo(true))) case join: Join if isSelectFromTarget(join.right) => join.copy(right = ResolvedHint(join.right, HintInfo(true))) } private def isSelectFromTarget(plan: LogicalPlan): Boolean = plan match { case p: Project => p.child match { case f: Filter => f.child match { case rel: HiveTableRelation => targets.get(rel.tableMeta.database, rel.tableMeta.identifier.table) match { case Some(pt) => isFilteredByLiteral(f.condition, pt) case None => false } … } private def isFilteredByLiteral(expr: Expression, pt: String): Boolean = ... } spark.experimental.extraOptimizations = InsertBroadcastHints :: Nil

Slide 37

Slide 37 text

OPTIMIZED LOGICAL PLAN Aggregate [...] +- Project [...] +- Join Inner, (...) :- Project [...] : +- Filter (...) : +- Relation[...] orc +- ResolvedHint (broadcast) +- Project [...] +- Filter (...) +- HiveTableRelation `db`.`area`, ...

Slide 38

Slide 38 text

PHYSICAL PLAN *(4) HashAggregate(...) +- Exchange hashpartitioning(...) +- *(3) HashAggregate(...) +- *(3) HashAggregate(...) +- Exchange hashpartitioning(...) +- *(2) HashAggregate(...) +- *(2) Project [...] +- *(2) BroadcastHashJoin [...], [...], Inner, BuildRight :- *(2) Project [...] : +- *(2) Filter (...) : +- *(2) FileScan orc db.transaction[...] Batched: true, Format: ORC +- BroadcastExchange HashedRelationBroadcastMode(...) +- *(1) Filter (...) +- Scan hive db.area [...], HiveTableRelation

Slide 39

Slide 39 text

RESULT • 6,000 sec → 200 sec

Slide 40

Slide 40 text

Agenda 1. Issue 2. Gathering Statistics 3. Injecting Custom Optimization Rules 4. Cost-based Optimizer

Slide 41

Slide 41 text

COST-BASED OPTIMIZATION • Apache Spark 2.2.0~ • Statistics, Cost • Advanced Optimizations 1. Broadcast Hash Join Based on Column Statistics 2. Join Reordering

Slide 42

Slide 42 text

COST-BASED OPTIMIZATION • Apache Spark 2.2.0~ • Statistics, Cost • Advanced Optimizations 1. Broadcast Hash Join Based on Column Statistics 2. Join Reordering

Slide 43

Slide 43 text

QUERY select a.* from db.a inner join db.b on a.id = b.id inner join db.c on b.key = c.key

Slide 44

Slide 44 text

QUERY select a.* from db.a inner join db.b on a.id = b.id inner join db.c on b.key = c.key 1,500,000,000 records 400,000,000 records 1 records

Slide 45

Slide 45 text

JOIN ORDER db.a (1,500,000,000 records) db.b (400,000,000 records) db.c (1 record)

Slide 46

Slide 46 text

db.a db.b db.c

Slide 47

Slide 47 text

CBO • set spark.sql.cbo.enabled = true • set spark.sql.cbo.joinReorder.enabled = true

Slide 48

Slide 48 text

db.b db.c db.a

Slide 49

Slide 49 text

RESULT • 1,000 sec → 100 sec

Slide 50

Slide 50 text

COST • cost = weight * numOfRows + (1.0 - weight) * dataSize

Slide 51

Slide 51 text

THANK YOU