Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using F# in Production at ClearTax

Ankit Solanki
October 15, 2016

Using F# in Production at ClearTax

Ankit Solanki

October 15, 2016
Tweet

More Decks by Ankit Solanki

Other Decks in Technology

Transcript

  1. Outline • History: the product and why we picked F#

    • Wins & Losses • Scaling-up with the language • Learnings
  2. The Product • cleartds.com • SAAS product for preparing TDS

    Returns • Quarterly deadlines for submission of TDS Returns • Launched in early 2014 (Pre-YC)
  3. Product Requirements • Accuracy • Cannot afford to make any

    mistakes • 1,000s of people in every single TDS return • Constraints: • Arcane file format used by the department • Lack of documentation
  4. Product Requirements • Flexibility • Rules change every quarter •

    File format changes every quarter • Constraints: • Changes announced with no warning • Updated formats applicable from the day of release
  5. Product Requirements • Simplicity / Expressiveness / Speed • Need

    ability to launch fast • Needed to be able to run the product on auto-pilot for large chunks of time (after launch) • Ability to quickly dip-in every quarter and make changes • Constraints: • Lack of engineering resources • Lack of time
  6. File Format • Flat file format, like CSV files •

    Different types of lines • Mode-based, hierarchical format • Large number of fields
 and line types • Definition changes over time • Field means X in 2015 but Y in 2016 • Different variants based on use case (submission from department, download from department) and time (quarter, year, form type) • Delta-submissions in case of filing a revised return. Multiple modes of correction
  7. • File format main blocker • Experimented with a few

    approaches • Procedural code, DSLs, Mapping tools • Found F# Type Providers
  8. Type Providers • Compile-time ability to deal with structured data

    formats • Generate types & metadata based on sample data, at compile time • Parser to read input data of that type
  9. Using type providers drastically simplified our actual parsing,
 and made

    our code super-readable deduction.Tds <- record.``TDS / TCS -Income Tax for the period`` deduction.Surcharge <- record.``TDS / TCS -Surcharge for the period`` deduction.EducationCess <- record.``TDS / TCS -Cess``
  10. • We tested the file parser / generator • Simple

    strategy: • Parse sample file into your data model • Generate it • Do line-by-line, field-by-field comparison • Automated a large part of this with F#
  11. • We ended up writing the whole product in F#

    • But we did not focus on making it ‘functional’ • Using F# it as a better C# or Java (at first) • Core components (business logic) written in mostly functional style • Glue (controllers) written more in an imperative style
  12. Our philosophy with FP • Be pragmatic • Grow with

    the language • Experiment, bring in parts that you feel are valuable
  13. Product launch • We wanted to launch in 6-8 weeks

    • Very focused execution • We learned parts of the language and limited ourselves to only those parts • Started out with mostly the basics: pattern matching, currying, pipelines
  14. Pipelines // Pipeline operator itself is very simple
 
 let

    (|>) x fn = fn x
 
 // Usage is natural
 
 let square x = x * x
 let double x = 2 * x
 
 [1 ; 2; 3 ] |> List.map square |> List.map double
 // [2 ; 8; 18 ] // A little more natural than composition, at least for beginners // Can be arbitrarily long
 users
 |> List.map validateUsers
 |> List.filter isValid
 |> List.map getUserId
  15. Pattern Matching • Since you can pattern match on a

    tuple with n elements, this results in the ability to flatten nested logic into a simple decision table • Huge win for readability • We might have gone over-board with this
  16. Partial Application • Ability to encapsulate in a functional manner

    • Big aha moment when I finally understood this // Validations would be context dependent let validateDeduction year quarter returnType deduction = … // But at a higher level, you want a simpler signature type ValidateFn<'A> = 'A -> ValidationResult // So you can just freeze the context sensitive parameters // to get the specific validator you want let currentValidator = validateDeduction 2016 Q4 Original
  17. Code as Data // Type definitions type IsColumnVisible = (TdsReturn

    -> bool) list type IsColumnEditable<'T> = ('T -> bool) list type ColumnSpecification<'T> = (Quotations.Expr * IsColumnVisible * IsColumnEditable<'T>) list // Column visibility specifiers let showAlways = ... let showForRevised = ... let showForGovernment = ... // Editing specifiers let editAlways = fun (d : Deduction) -> true let editWhenNil = fun (d : Deduction) -> d.Amount = 0 let editWhenDateIsPresent = fun (d : Deduction) -> d.Date |> Option.isSome // UI Specification let (deductionColumns : ColumnSpecification<Deduction>) = [ <@ fun (c : Deduction) -> c.Date @> , [ showAlways ] , [ editAlways ] <@ fun (c : Deduction) -> c.SectionCode @> , [ showForRevised ] , [ editWhenNil ] <@ fun (c : Deduction) -> c.Amount @> , [ showForRevised ; showForGovernment ] , [ editWhenNil ; editWhenDateIsPresent ] ]
  18. ORM • ORM – we picked a C#-specific ORM [ServiceStack.OrmLite]

    early on • ORM was ideal for the product use case (bulk inserts, updates, simple conceptual model) • It actually worked pretty well with F# (with a minimal wrapper) • Had some issues, will go into detail later let loadDeductionsByName name = // Where clause let condition = <% fun (d : Deduction) -> d.Name = name %> // Order-by clause let ordering = <% fun (d : Deduction) -> d.CreateTimestamp %> // Pagination let currentPage = { page = 1 ; pageSize = 10 } // This executes: // select * from deduction where name = ? order by create_timestamp limit 10 DbHelper.LoadPageWhen currentPage condition ordering
  19. Design Issue: Not leveraging types • One of the main

    mistakes we made • ORM layer was unable to deal with F# specific types (records, tuples, discriminated unions) • This resulted in nullable values introduced in the data model • Polluted the whole codebase • Right solution would have been to isolate this in a data layer
  20. More type problems • Same base types (example: Tax Deduction)

    used throughout the product • In some ways, made things simpler • But also led to unnecessary complexity • UI may not need to know about some fields, but it still gets them • Too much capability stuffed into a single entity • We should have defined more granular types • If I started over, would have spent more time getting the types right, building a layered architecture
  21. Performance: Lazy Evaluation F# Sequences and their transformations are lazy

    let squares = [1 ; 2 ] |> Seq.map (fun i -> printfn "%d" i i * i ) printfn "%A" squares // 1 // 2 // seq [1; 4] printfn "%A" squares // prints again // 1 // 2 // seq [1; 4] Usually, laziness is what we want. But responsibility lies with caller. Sometimes, this can lead to very expensive operations being repeated.
  22. Performance: Expression Trees • Or “Code Quotations” – language level

    feature of F# • Expression trees that you can work with programatically • Example: 
 let ordering = <% fun (d : Deduction) -> d.CreateTimestamp %> • Possible to inspect this expression and do code generation, evaluate it, get the property name it refers to, etc
  23. Performance: Expression Trees (continued) • These are actually fairly expensive

    to build • Relatively slow, even when compared to operations like creating a new function • First version of our application used quotations while generating the final TDS return • One quotation per field, per line • Profiler said that 99.99% time was spent building the trees or traversing these trees • Refactored the code to get a 100x speed improvement
  24. Data Structure Selection • This was a relatively minor issue,

    but still painful • F# has its own parallel data structures (List, Map, etc) • Different from standard C# structures • Immutable in design • Picking the right data structure was tricky • Libraries would work with C# lists, not F# lists, we had to convert • Also, default list in F# is a linked-list, not a array list
  25. Tooling Issues: IDE • Visual Studio tooling for F# was

    not great in the beginning • We had crashes, slowdowns, etc • Example: • F# does not allow cyclic dependencies • Order of files in the proejct matters • Visual Studio (2013) actually did not have options to insert file at a particular location or re-order files • We hand-edited project files for a long time
  26. IDE (continued) • F# support in Visual Studio is better

    now • Still not ideal, not on-par with C# • F# compiler much slower than C# compiler, for example • Will take some time to catch up • F# story outside Windows is also good, now
  27. Tooling Issues: Language / Compiler Versions • Hit by this

    several times • Subtle differences in complier versions or language level support caused compilation fail during deployment • "Works on my machine", though • Resolving this was very painful
  28. Computation Expressions • Syntactic sugar for monads • (Let's not

    talk about monads) • Will let you design 'workflows', flatten your logic even further • Simplified our business logic
  29. // Sugared syntax using the maybe computation expression // 'maybe'

    is not a built-in let lateFine = maybe { // CreditDate, TdsDate are option types, can be None if not entered let! creditDate = deduction.CreditDate let! deductionDate = deduction.TdsDate let diffInMonths = getDifferenceInMonth creditDate deductionDate return (calculateFine deduction.taxDeducted diffInMonths }
  30. // De-sugared let lateFine = match creditDate with | None

    -> None | Some c -> match deductionDate with | None -> None | Some d -> let diffInMonths = getDifferenceInMonth c d calculateFine deduction.taxDeducted diffInMonths
  31. Big win with 'glue' code • The ‘glue’ logic in

    controllers (written in a mostly imperative style) ended up being more and more complex • Needed to handle different use cases, UI states • Things became difficult to reason about • We started to use computation expressions to simplify it (where suitable)
  32. Type annotations are a win • Initially, we omitted type

    annotations in most places • “Complier is smart enough to infer types, why do I need to mention them?” • Code was difficult to read, and error messages were a little cryptic at times • Type inference would pick up wrong type for the function if you make a mistake • We started adding annotations to functions • Also started defining type aliases for common combinations
 
 let d : ColumnSpecification<Deduction> = …
 is more readable than
 let d : ((Quotations.Expr * IsColumnVisible * IsColumnEditable<'T>) list)
  33. Union types • If doing a do-over, would especially focus

    on algebraic types / union types • Data model would be richer if we had made proper use • Mostly ended up using union types at leaf level (individual fields), did not use at higher levels (entity level, composition of entities)
  34. On-boarding engineers • We thought F# would be hard for

    people to learn • Current team: One person with prior Haskell experience, one JavaScript programmer and one fresher • Everyone was able to ramp-up and start using F# very quickly • In-depth understanding takes time though • Basics are really easy to pick up
  35. Design matters • Language will not solve problems for you

    • No substitute for good design • The areas where we took shortcuts came back to haunt us • Functional languages like F# do help, if you are willing to listen • Not a silver bullet
  36. F# on the CLR • When we started, fewer F#

    specific libraries • Ability to use any C# library, but they often didn't feel natural • We made compromises here. Used some sub-optimal tools • Situation is better now. F# ecosystem is vibrant and growing!
  37. F# at ClearTax right now • TDS Product fully built

    & running on F# • Some features of ClearTax built in F# • Majority of ClearTax is still C# • Mostly because of tooling issues: not possible to mix and match languages within a singe 'project' • Browser based testing – Canopy, a F# DSL over Selenium
  38. Q&A

  39. Resources for F# • Online • F# For Fun And

    Profit • F# Programming WikiBook • Books • F# Deep Dives • F# Applied