Stateful PBT, with a game logic case study

Stateful Property-Based Testing LOU Xun / CCP Games With a
Game Logic Case Study

Topics • Testing: exampled-based, TDD • Property-Based Testing • Stateful
PBT, ﬁxing a concurrency bug • Elixir • PropCheck (PropEr)

About Me • LOU Xun (楼洵) • Erlang since univ.
|> Elixir ~4 years • Software Engineer @ CCP Games • ESI (player-facing APIs for game data) • Internal Tools and Pipelines • Chat System (ejabberd in Elixir!)

EVE Online

EVE Online • Sci-ﬁ (spaceship!) MMO, sandbox by players

EVE Online • Sci-fi (spaceship!) MMO, sandbox by players •
fleet fights! • large scale: 6000+ on a same battlefield • consequential: B-R cost $300,000+ (2014) • single Python process, Time Dilation (TiDi) • Elixir?!

Core Rules

• Location, Item Core Rules

• Location, Item • Attribute Core Rules

• Location, Item • Attribute • Relationship Core Rules

• Location, Item • Attribute • Relationship • ==> LIAR
(for source of truth ) Core Rules

• Location, Item • Attribute • Relationship • ==> LIAR
(for source of truth ) • logical foundation of everything in space Core Rules

LIAR Goals • Prototype to replace current impl. • Each
Location in an Erlang Process (Actor) • Multicore parallelism (multi-node?) • faster cores ($$$) more cores ($) • Message passing, “eventual consistency” • DSL, give more power to Game Design

Deﬁned APIs • Relationship • add/remove modiﬁers (source -> target)
• propagate updates (A -> B -> C) • DAG • Item, Attribute: new, get/set… • Location: start/stop (Actor)

TDD • Deﬁned APIs make it easy to adopt •
Incremental, iterative development • Focus on single feature • local, then remote • Example-based • Most tests we write are example-based

test "add item modifier should modify the target attribute value"
do Liar.start_location(1) i2 = simple_item(2, %{1 => 10}) i3 = simple_item(3, %{2 => 20}) assert :ok = Liar.load_item(1, i2) assert :ok = Liar.load_item(1, i3) assert :ok = Liar.add_item_modifier(:add, {2, 1}, {3, 2}) assert 30 == Liar.get_value({3, 2}) end • Modiﬁer carries source value • add source to target (both {item, attribute}) • in this case, add 10 to 20 => 30

Flaws • Heavy and duplicated setup • 5 lines out
of 7, for the ﬁrst test case… test "add item modifier should modify the target attribute value" do Liar.start_location(1) i2 = simple_item(2, %{1 => 10}) i3 = simple_item(3, %{2 => 20}) assert :ok = Liar.load_item(1, i2) assert :ok = Liar.load_item(1, i3) assert :ok = Liar.add_item_modifier(:add, {2, 1}, {3, 2}) assert 30 == Liar.get_value({3, 2}) end

Flaws • Simple and static input • 10 + 20
= 30 test "add item modifier should modify the target attribute value" do Liar.start_location(1) i2 = simple_item(2, %{1 => 10}) i3 = simple_item(3, %{2 => 20}) assert :ok = Liar.load_item(1, i2) assert :ok = Liar.load_item(1, i3) assert :ok = Liar.add_item_modifier(:add, {2, 1}, {3, 2}) assert 30 == Liar.get_value({3, 2}) end

Flaws • Need human to think of edge cases •
0? -1? inf? NaN?? test "add item modifier should modify the target attribute value" do Liar.start_location(1) i2 = simple_item(2, %{1 => 10}) i3 = simple_item(3, %{2 => 20}) assert :ok = Liar.load_item(1, i2) assert :ok = Liar.load_item(1, i3) assert :ok = Liar.add_item_modifier(:add, {2, 1}, {3, 2}) assert 30 == Liar.get_value({3, 2}) end

Test Examples

Test Properties!

property "new attribute have correct data" do forall {id, value}
<- {integer(), float()} do attr = Attribute.new(id, value) assert Attribute.get_value(attr) == Attribute.get_base_value(attr) end end

• Generators instead of static input • Deﬁnes input boundary
property "new attribute have correct data" do forall {id, value} <- {integer(), float()} do attr = Attribute.new(id, value) assert Attribute.get_value(attr) == Attribute.get_base_value(attr) end end

• Randomize input from large search spaces property "new attribute have correct data" do forall {id, value} <- {integer(), float()} do attr = Attribute.new(id, value) assert Attribute.get_value(attr) == Attribute.get_base_value(attr) end end

• Randomize input from large search spaces • Find (minimal) counter examples for you property "new attribute have correct data" do forall {id, value} <- {integer(), float()} do attr = Attribute.new(id, value) assert Attribute.get_value(attr) == Attribute.get_base_value(attr) end end

• Randomize input from large search spaces • Find (minimal) counter examples for you • How to deﬁne useful properties? property "new attribute have correct data" do forall {id, value} <- {integer(), float()} do attr = Attribute.new(id, value) assert Attribute.get_value(attr) == Attribute.get_base_value(attr) end end

Finding Properties credit: Fred Hebert, propertesting.com

Finding Properties • Modeling: simpler, ineﬃcient impl. • quicksort ==
bubble sort credit: Fred Hebert, propertesting.com

bubble sort • Partial invariant • list size/elements doesn’t change credit: Fred Hebert, propertesting.com

bubble sort • Partial invariant • list size/elements doesn’t change • Symmetric properties • encoder decoder pair credit: Fred Hebert, propertesting.com

One More Thi… Flaw • TDD: rarely cross-feature test cases!
• load_item… unload and load again, does it work? • (hint: it doesn’t) • (hint2: no one would ever think of this) • Most other forms of testing as well • How is the system used in real world? • Generator, but for user behaviours?

Stateful PBT • Simulate real world usage of a system
• Model the system with an “abstract statem” • Generate a sequence of commands • Execute all the commands • Check result / invariants • or, even just running all commands can fail

Almost Stateless… property "Liar top level APIs" do forall cmds
in commands(__MODULE__) do ...setup ... {history, state, result} = run_commands(__MODULE__, cmds) ...tear down ... result == :ok ...custom output ... end end commands and run_commands • use deﬁned callbacks • represents 2 steps in stateful PBT

Five Callbacks credit: Fred Hebert, propertesting.com

Five Callbacks Command Generation credit: Fred Hebert, propertesting.com

Five Callbacks Command Generation Actual Testing credit: Fred Hebert, propertesting.com

Library Example • init: {[], []} (library, user) • command:
new_book, borrow, return • precondition: true, library/user have the book • next_state: {[A], []} -> {[], [A]} • postcondition: only one A exist! (invariant)

Case Study: Concurrency Bug • Not live demo… (just for
the look) • Read and use test output • Eﬀectiveness vs. example-based tests • Tips on writing a stateful PBT • Inspiration for ﬁnding system property

Shrinking • As important as generating • removes inconsequential commands
(noise) • focus on real problems • Tries to minimize the counter example • originally 27 commands… • shrank to 9 (1/3)

Symbolic Calls Commands: [ {:set, {:var, 1}, {:call, Liar, :start_location,
[9]}}, {:set, {:var, 2}, {:call, Liar, :load_item, [9, Liar.Item<id: 92>]}}, {:set, {:var, 7}, {:call, Liar, :start_location, [7]}}, {:set, {:var, 14}, {:call, Liar, :load_item, [7, Liar.Item<id: 88>]}}, {:set, {:var, 23}, {:call, Liar, :add_item_modifier, [:dr_add, {92, 1}, {88, 46}]}}, {:set, {:var, 24}, {:call, Liar, :unload_item, [92]}}, {:set, {:var, 25}, {:call, Liar, :unload_item, [88]}}, {:set, {:var, 26}, {:call, Liar, :load_item, [9, Liar.Item<id: 77>]}}, {:set, {:var, 27}, {:call, Liar, :add_item_modifier, [:dr_add, {77, 11}, {77, 9}]}} ]

Actual Calls Liar.start_location(9) Liar.load_item(9, Liar.Item<id: 92>) Liar.start_location(7) Liar.load_item(7, Liar.Item<id: 88>)
Liar.add_item_modifier(:dr_add, {92, 1}, {88, 46}) Liar.unload_item(92) Liar.unload_item(88) Liar.load_item(9, Liar.Item<id: 77>) Liar.add_item_modifier(:dr_add, {77, 11}, {77, 9})

Liar.add_item_modifier(:dr_add, {92, 1}, {88, 46}) Liar.unload_item(92) Liar.unload_item(88) Liar.load_item(9, Liar.Item<id: 77>) Liar.add_item_modifier(:dr_add, {77, 11}, {77, 9}) Auto-gen’ed later! === Debug Commands === # item generation item2_814 = X.simple_item(814, ...) # repro steps Liar.start_location(44) Liar.load_item(44, item2_814) Liar.unload_item(814) Liar.load_item(44, item2_814)

Liar.add_item_modifier(:dr_add, {92, 1}, {88, 46}) Liar.unload_item(92) Liar.unload_item(88) Liar.load_item(9, Liar.Item<id: 77>) Liar.add_item_modifier(:dr_add, {77, 11}, {77, 9}) Looks sane…

Captured Logs [error] GenServer {Liar.Runtime.LocationRegistry, 7} terminating ** (FunctionClauseError) ...
(liar) lib/liar/item.ex:52: Liar.Item.get_attribute(nil, 46) ... Last message: {:"$gen_cast", {:rim_target, {92, 1}, {88, 46}}}

Captured Logs • Direct cause: trying to get attribute from
nil item [error] GenServer {Liar.Runtime.LocationRegistry, 7} terminating ** (FunctionClauseError) ... (liar) lib/liar/item.ex:52: Liar.Item.get_attribute(nil, 46) ... Last message: {:"$gen_cast", {:rim_target, {92, 1}, {88, 46}}}

nil item • First line: which actor crashed (“Location 7”) [error] GenServer {Liar.Runtime.LocationRegistry, 7} terminating ** (FunctionClauseError) ... (liar) lib/liar/item.ex:52: Liar.Item.get_attribute(nil, 46) ... Last message: {:"$gen_cast", {:rim_target, {92, 1}, {88, 46}}}

nil item • First line: which actor crashed (“Location 7”) • Last line: crashed when handling what message • “remove item modiﬁer at target location” • no “remove modiﬁer” commands… • must happened during item unload! [error] GenServer {Liar.Runtime.LocationRegistry, 7} terminating ** (FunctionClauseError) ... (liar) lib/liar/item.ex:52: Liar.Item.get_attribute(nil, 46) ... Last message: {:"$gen_cast", {:rim_target, {92, 1}, {88, 46}}}

Setup Liar.start_location(9) Liar.load_item(9, Liar.Item<id: 92>) Liar.start_location(7) Liar.load_item(7, Liar.Item<id: 88>) Liar.add_item_modifier(:dr_add,
{92, 1}, {88, 46})

Crash Liar.unload_item(92) Liar.unload_item(88) (Caller = Actor running test)

Observe Liar.load_item(9, Liar.Item<id: 77>) Liar.add_item_modifier(:dr_add, {77, 11}, {77, 9})

Observe • Erlang (thus Elixir) provides strong isolation • one
crashed Actor doesn’t damage any other • neither the VM Liar.load_item(9, Liar.Item<id: 77>) Liar.add_item_modifier(:dr_add, {77, 11}, {77, 9})

Observe • Erlang (thus Elixir) provides strong isolation • one
crashed Actor doesn’t damage any other • neither the VM • PropEr shows us all and only necessary steps • to produce and observe failure Liar.load_item(9, Liar.Item<id: 77>) Liar.add_item_modifier(:dr_add, {77, 11}, {77, 9})

Validating the Fix ◊ mix test test/liar_pbt_test.exs Excluding tags: [skip:
true] OK: The input passed the test. . Finished in 0.1 seconds 1 property, 0 failures Randomized with seed 667925

Other Bugs Revealed • Item still registered after unload •
Leftover outgoing modiﬁers after unload • Wrong return format • Bug in dependent package • …

Lines of Code Blank Comment Code code 238 185 927
TDD 80 1 351 stateful PBT 53 2 177 (old data, PBT not complete)

TDD 80 1 351 stateful PBT 53 2 177 (old data, PBT not complete) FUN!

example test 86 1 370 stateful PBT 99 10 386 • more commands, even Process.exit! • refactor for readability • ~100 lines for debug output!

How to write one?

propertesting.com • by Fred Hebert • this talk highly inspired
by him • Free for online reading • Learn You Some Erlang • Erlang in Anger • The Zen of Erlang • and more…

LIAR Stateful PBT lessons learnt

Five Callbacks • init • command: control command generation •
precondition: validate generated command • next_state • postcondition

command “ﬁltering”

command “filtering” • No locations: only generate start_location • No
items: only start_location or load_item • Has items: most functions are valid • Has modifiers: can remove modifiers def command(%__MODULE__{items: items} = state) when map_size(items) == 0 do frequency([ {1, {:call, Liar, :start_location, [gen_new_lid(state)]}}, {50, {:call, Liar, :load_item, [gen_loaded_lid(state), gen_new_item(state)]}} ]) end

command “filtering” • No locations: only generate start_location • No
items: only start_location or load_item • Has items: most functions are valid • Has modifiers: can remove modifiers • Forces you to think how flexible the system is • NOT used during shrinking!

precondition • Validate arguments (exist in StateM…) • Correct shrinking
relies on this • WTH no locations! • Shrink: remove several commands, then use precondition to valid the remaining sequence Commands: [ {:set, {:var, 2}, {:call, Liar, :load_item, [10, Liar.Item<id: 21>]}} ]

DRY • Functions to list valid arguments • Wrap generators
using ^ def command(%__MODULE__{locations: []} = state) do {:call, Liar, :start_location, [gen_new_lid(state)]} end def precondition(state, {:call, Liar, :load_item, [lid, item]}), do: Enum.member?(loaded_lids(state), lid) && Enum.member?(new_item_ids(state), item.id) defp loaded_lids(state), do: state.locations defp gen_loaded_lid(state), do: loaded_lids(state) |> elements()

Five Callbacks • init • command • precondition • next_state:
abstract model transition • postcondition: check result / invariant

Stateful Test • Don’t repeat your logic! • Use simple
state • Use ineﬃcient algorithm • Check (partial) invariants • “only 1 book exist in library + user”

General Notes • “Fixing” the model (test) is normal •
mix propcheck.clean • Adjust frequency to expose diﬀerent bugs • And number / size of tests • especially helpful if setup is heavy • Nevertheless, great tools help (mix test )

LIAR Speciﬁc • Testing all Locations (multiple actors) • Requires
synchronization • LIAR’s “consistency guarantee” • sync “call” after certain commands • reason for the observe step in case study

P in Stateful PBT??

–Fred, propertesting.com Stateful property tests are particularly useful when “what
the code should do”—what the user perceives —is simple, but “how the code does it”—how it is implemented—is complex.

EVE Rules • What is user’s perspective for attributes? •
Actually quite simple: • base_value + all modiﬁers -> real value • modiﬁers carry source values • recursively apply the same simple rule!

LIAR “Property” defp calculate_value(state, item_attr) do base_value = ( ...
get base value) state |> resolve_graph() |> Graph.in_edges(item_attr) |> Enum.map(fn e -> {mod, _} = e.label {mod, calculate_value(state, e.v1)} end) |> TestModifiers.evaluate(base_value) end

Inspired? • Think properties from a user’s perspective • does
it have a “simple” mental model? • Real impl. can’t aﬀord the simple model • calculate once, store the result, propagate • Caching vs. recursive calculation

PBT and co. • Complement not replacement • (example-based) TDD
for dev., PBT for veriﬁcation • Better understanding of your system & domain • Do require some eﬀort to get comfortable with • not suitable for all problems • but gives a lot of satisfaction when useful :)

Fin. [email protected] http://aqd.is/

Stateful PBT, with a game logic case study

Stateful PBT, with a game logic case study

More Decks by Lou Xun

Other Decks in Programming

Featured

Transcript