Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Adventures in Data Sitting

Adventures in Data Sitting

This was a talk from WNYRB I gave in 2012. It was basically a few anecdotes about connecting to Legacy Databases from a rails application which happens to be a common theme for me lately.

Bill Chapman

January 10, 2012
Tweet

More Decks by Bill Chapman

Other Decks in Programming

Transcript

  1. Adventures in Data-sitting “ Why do they have a date

    field set to varchar(5000)? ” - dpk Saturday, April 6, 13
  2. In the rails world specifically, for years the answer was

    “Don’t Do It” Saturday, April 6, 13 Connecting to Microsoft SQL and other non open source rdbms. I found this statement to be very jarring until I actually had to connect to something other than a friendly db and schema. This highlights the ruby centric view that seems prevalent in the community. Most database engines can do a lot of cool stuff and way too many developers seem to avoid them in favor of abstracting the direct interaction away. While I do not disagree with this approach I think it tends to skew solutions away from the database itself for the sake of data agnosticity or developer comfort. There are a lot of database features that can make a database more rails friendly and we’ll talk about a few.
  3. leg·a·cy   [leg-uh-see] Show IPA noun, plural -cies. The <insert thing

    here> that existed at least 30 seconds before you decided to start your project Inconsistent Schema Poor naming conventions Non rails friendly decisions Data Source without a friendly ruby wrapper Not Invented Here Invented here before I got here ... Saturday, April 6, 13
  4. Useful Foundations • Active Record • SQL(triggers, views, rules, scripting)

    • Data Patterns (ETL, Service API, SQL Views, Warehouse) • Glue Tools (Jruby, MacRuby, Ruby Extensions...) Saturday, April 6, 13 Consider the scope of your application to be much larger than the stack you’ve chosen around the application itself. You don’t need to use jruby for the application itself to use a java based library to implement your data transfer pattern.
  5. Connectors • Tiny TDS + activerecord-sqlserver-adapter • jruby + Jackcess

    • Jruby + jruby-activerecord-sqlserver- adapter Saturday, April 6, 13
  6. Quick Fixes in the DB • Views • Instead Do

    Rules • Triggers • Materialized Views (if available) Saturday, April 6, 13 The point here is that you aren’t stuck with what you have. You may not be able to change the current schema but can you add to it and augment the schema? Can you get access to a read only mirror? Can you add the tables you need, in a rails friendly way in to the same database with the badly formed data?
  7. Rails Friendly Views Saturday, April 6, 13 You can do

    this in MSSQL and PG. Possibly others. But aren’t views very slow? You can index a view in MS and postgres. You can also use tools like thinking_sphinx on the models if you are using AR.
  8. Case Study: Copy Pasta • Connect to multiple MSSQL databases

    (25+) • Databases are legacy in age and schema • Maintain a significant amount of ‘local’ information Saturday, April 6, 13 25 + Different MSSQL Databases that grew via copy pasta development. Each schema was similar enough to be recognizeable as almost the same app but inconsistent enough so to be impossible to generalize in a single set of models/mappings.
  9. Copy Pasta Gotchas • All db’s had a different schema

    but represent similar data • No control over source data • Each DB has millions of records • All data will be (ET)L’d into the same place • The Legacy Schemas are very poor Saturday, April 6, 13
  10. Copy Pasta Solution • Multiple AR::Base children • Tiny TDS

    for SQL Server Wrapper • ETL Pattern • Data Warehouse (Star Schema) • Local DB for app specific data Saturday, April 6, 13
  11. Case Study: Xerox ++ • Replacing in-use legacy app •

    Must implement all current features before launch • Database is not rails friendly I want the new app to work exactly like the old app but with feature a,b...z we can’t switch over until you are done and the switch better be seamless Saturday, April 6, 13 The first instinct here is to be happy that you get to start from scratch your schema can be clean and free of crud. The reality is that once you finish the application you still need to write a single use ETL to transfer the data from the old database to the new database. The client,boss,end user is going to expect every feature they used to have to be available in the new application and you are looking at a potentially very painful switchover process.
  12. Xerox ++ Solution • Replacing in-use legacy app • Must

    implement all current features before launch • Database is not rails friendly Saturday, April 6, 13
  13. Avoid The “Switchover” • Build a service on top of

    the things we have discussed • Start with features they don’t have to make something useful right away • Try to avoid a forced waterfall approach Saturday, April 6, 13 By avoiding the switchover you can stick with a more iterative development process
  14. Case Study: The Boss’s Nephew • Replacing Original Application written

    in a ‘Toy’ environment (see Access) • No good wrapper exists for the data source (Alpha 5, Access, Goldmine) • Xerox ++ is also in effect Saturday, April 6, 13
  15. Boss’s Nephew Solution • Migrate the data to a more

    friendly platform if possible. • (OR) Find a way to get at the data and use an ETL Pattern Saturday, April 6, 13
  16. Case Study: Log file DB • Aggregate results from multiple

    log files with a set format • Display results as close to real time as possible • Maintain results in AR friendly schema to be consumed by a rails app • Process potentially millions of log lines per Saturday, April 6, 13 In this case we don’t even have a nice data source, but just a bunch of text files in several data centers around the world. Fortunately they were in a pretty static format.
  17. LOG File DB Solution • ETL Pattern Strikes again •

    Light Weight Log reader service for each log • Service to post the logs to • Triggers to split the log data as needed Saturday, April 6, 13 In this case Wayne wrote the log reader services so of course it was written in bash but we can pretend it was a ruby for the sake of this talk.