Adventures in Data Sitting

Adventures in Data-sitting “ Why do they have a date
ﬁeld set to varchar(5000)? ” - dpk Saturday, April 6, 13

In the rails world speciﬁcally, for years the answer was
“Don’t Do It” Saturday, April 6, 13 Connecting to Microsoft SQL and other non open source rdbms. I found this statement to be very jarring until I actually had to connect to something other than a friendly db and schema. This highlights the ruby centric view that seems prevalent in the community. Most database engines can do a lot of cool stuff and way too many developers seem to avoid them in favor of abstracting the direct interaction away. While I do not disagree with this approach I think it tends to skew solutions away from the database itself for the sake of data agnosticity or developer comfort. There are a lot of database features that can make a database more rails friendly and we’ll talk about a few.

leg·a·cy [leg-uh-see] Show IPA noun, plural -cies. The <insert thing
here> that existed at least 30 seconds before you decided to start your project Inconsistent Schema Poor naming conventions Non rails friendly decisions Data Source without a friendly ruby wrapper Not Invented Here Invented here before I got here ... Saturday, April 6, 13

Useful Foundations • Active Record • SQL(triggers, views, rules, scripting)
• Data Patterns (ETL, Service API, SQL Views, Warehouse) • Glue Tools (Jruby, MacRuby, Ruby Extensions...) Saturday, April 6, 13 Consider the scope of your application to be much larger than the stack you’ve chosen around the application itself. You don’t need to use jruby for the application itself to use a java based library to implement your data transfer pattern.

Connectors • Tiny TDS + activerecord-sqlserver-adapter • jruby + Jackcess
• Jruby + jruby-activerecord-sqlserver- adapter Saturday, April 6, 13

Quick Fixes In AR • set_table_name “BadTAble_name” • set_primary_key “TerribleKey_nAme”
• NastyConnector < ActiveRecord::Base Saturday, April 6, 13

Quick Fixes in the DB • Views • Instead Do
Rules • Triggers • Materialized Views (if available) Saturday, April 6, 13 The point here is that you aren’t stuck with what you have. You may not be able to change the current schema but can you add to it and augment the schema? Can you get access to a read only mirror? Can you add the tables you need, in a rails friendly way in to the same database with the badly formed data?

Rails Friendly Views Saturday, April 6, 13 You can do
this in MSSQL and PG. Possibly others. But aren’t views very slow? You can index a view in MS and postgres. You can also use tools like thinking_sphinx on the models if you are using AR.

Case Study: Copy Pasta • Connect to multiple MSSQL databases
(25+) • Databases are legacy in age and schema • Maintain a signiﬁcant amount of ‘local’ information Saturday, April 6, 13 25 + Different MSSQL Databases that grew via copy pasta development. Each schema was similar enough to be recognizeable as almost the same app but inconsistent enough so to be impossible to generalize in a single set of models/mappings.

Copy Pasta Gotchas • All db’s had a different schema
but represent similar data • No control over source data • Each DB has millions of records • All data will be (ET)L’d into the same place • The Legacy Schemas are very poor Saturday, April 6, 13

Copy Pasta Solution • Multiple AR::Base children • Tiny TDS
for SQL Server Wrapper • ETL Pattern • Data Warehouse (Star Schema) • Local DB for app speciﬁc data Saturday, April 6, 13

Case Study: Xerox ++ • Replacing in-use legacy app •
Must implement all current features before launch • Database is not rails friendly I want the new app to work exactly like the old app but with feature a,b...z we can’t switch over until you are done and the switch better be seamless Saturday, April 6, 13 The ﬁrst instinct here is to be happy that you get to start from scratch your schema can be clean and free of crud. The reality is that once you ﬁnish the application you still need to write a single use ETL to transfer the data from the old database to the new database. The client,boss,end user is going to expect every feature they used to have to be available in the new application and you are looking at a potentially very painful switchover process.

Xerox ++ Solution • Replacing in-use legacy app • Must
implement all current features before launch • Database is not rails friendly Saturday, April 6, 13

Avoid The “Switchover” • Build a service on top of
the things we have discussed • Start with features they don’t have to make something useful right away • Try to avoid a forced waterfall approach Saturday, April 6, 13 By avoiding the switchover you can stick with a more iterative development process

Case Study: The Boss’s Nephew • Replacing Original Application written
in a ‘Toy’ environment (see Access) • No good wrapper exists for the data source (Alpha 5, Access, Goldmine) • Xerox ++ is also in effect Saturday, April 6, 13

Boss’s Nephew Solution • Migrate the data to a more
friendly platform if possible. • (OR) Find a way to get at the data and use an ETL Pattern Saturday, April 6, 13

Case Study: Log file DB • Aggregate results from multiple
log files with a set format • Display results as close to real time as possible • Maintain results in AR friendly schema to be consumed by a rails app • Process potentially millions of log lines per Saturday, April 6, 13 In this case we don’t even have a nice data source, but just a bunch of text files in several data centers around the world. Fortunately they were in a pretty static format.

LOG File DB Solution • ETL Pattern Strikes again •
Light Weight Log reader service for each log • Service to post the logs to • Triggers to split the log data as needed Saturday, April 6, 13 In this case Wayne wrote the log reader services so of course it was written in bash but we can pretend it was a ruby for the sake of this talk.

Adventures in Data Sitting

Adventures in Data Sitting

Bill Chapman

More Decks by Bill Chapman

Other Decks in Programming

Featured

Transcript

Adventures in Data-sitting “ Why do they have a date

In the rails world speciﬁcally, for years the answer was

leg·a·cy [leg-uh-see] Show IPA noun, plural -cies. The <insert thing

Useful Foundations • Active Record • SQL(triggers, views, rules, scripting)

Connectors • Tiny TDS + activerecord-sqlserver-adapter • jruby + Jackcess

Quick Fixes In AR • set_table_name “BadTAble_name” • set_primary_key “TerribleKey_nAme”

Quick Fixes in the DB • Views • Instead Do

Rails Friendly Views Saturday, April 6, 13 You can do

Case Study: Copy Pasta • Connect to multiple MSSQL databases

Copy Pasta Gotchas • All db’s had a different schema

Copy Pasta Solution • Multiple AR::Base children • Tiny TDS

Case Study: Xerox ++ • Replacing in-use legacy app •

Xerox ++ Solution • Replacing in-use legacy app • Must

Avoid The “Switchover” • Build a service on top of

Case Study: The Boss’s Nephew • Replacing Original Application written

Boss’s Nephew Solution • Migrate the data to a more

Case Study: Log ﬁle DB • Aggregate results from multiple

LOG File DB Solution • ETL Pattern Strikes again •