Cachirulo valley Bigtable and friends

9/13/12 BigTable & Friends & friends

9/13/12 BigTable & Friends NoSQL and RDBMs BigTable
Hbase Cassandra

NoSQL It’s a DB engine which doesn’t stick to the
usual RDBMs parameters! Main differences with RDBMs! !- It might not give ACID guarantees! !- SQL is not available, not fully supported or is not the focus.! !- No Schema! !- Distributed ! !- Designed to scale in some way! !- Designed to suit 21st century needs! ! ! 9/13/12 BigTable & Friends

NoSQL categories Key value! Graph! Document! ! 9/13/12 BigTable
& Friends

Abuse of usage of joins has a
critical performance impact. ! Human language to query data! When one server is not enough. ! SQL 9/13/12 BigTable & Friends

SQL teach you to think about the queries.   Which
is good! 9/13/12 BigTable & Friends

RDBMs issues when you reach the limit When they were
designed petabytes were science ﬁc>on. Expensive : Money, Learning curve Sharding and replica>on : Master – Slave, Single points of failure, Each solu>on is homebrewed Some of RDBMs magic is diﬃcult to achieve when you break the one server barrier. 9/13/12 BigTable & Friends

RDBMs are ROW oriented Table: collection of rows written in
the same ﬁle! Row: Whole chunk of info (Key + columns)! foo john doe 32 bar Chuck Norris 72 baz john doe 28 Key Name Age 9/13/12 BigTable & Friends

Bigtable overview Column oriented! ! Ordered key value map.! !
Column families : 1 row ﬁxed Column Families but non-ﬁxed number of Columns! ! Write optimization : Sequential writes! ! Merge reads! ! ! ! ! !Index-> SSTables (Sorted String Table)! ! Simple client API! ! Client API (big table paper says)! “Client applications can write or delete values in Bigtable, look up values from individual rows, or iterate over a subset of the data in a table.! ! 9/13/12 BigTable & Friends

BigTable Schema Sorted map Indexed By Row Key Column Key
and Timestamp! ! Row: Arbitrary string that indexes data in tablets:! Column family : Usually stored in the same ﬁle, contains several column keys. ! Columns: are indexed by column name.! Timestamps: each column have a timestamp! ! ! Name surname Name surname Chuck Norris Eva foo 33 33 123 123 rowK: foo rowK: bar surname bar 235 address ….. 999 Example use data container column identiﬁers eg with dates! 9/13/12 BigTable & Friends We can use column names as useful pieces of info instead as just a coordinate.!

Design thinking on Queries Are we going to just need
a row key? ! Do we always know the key?! Do we want to query by some ﬁeld value?! Do we know always the schema or could be variable?! 9/13/12 BigTable & Friends

HBase Based on BigTable! ! Relies on HDFS so the
DB engine doesn’t care about replication.! ! Region server farms and Fileserver latency.! ! More than one process. Hadoop, HDFS, Zookeeper, Hbase.! ! ! 9/13/12 BigTable & Friends

Cassandra Based on Bigtable and Amazon’s Dynamo DB! ! Partitioning
consistent hashing : 1 key maps to a server.! ! Fully tuneable replication factor, consistency level! ! Hand-off severs! ! Ring, Gossip protocol, Server simetry and p2p.! ! ! ! ! 9/13/12 BigTable & Friends

Hbase Column families! byte[] for key! Byte[] for fields and
column names! No fixed schema! Cassandra Column families! byte[] for key! Byte[] for fields and column names! ! SuperColumn families! “Groups of column families “! Can define schema for column families! 9/13/12 BigTable & Friends

9/13/12 BigTable & Friends

Hbase Java native client 9/13/12 BigTable &
Friends Basic operations!

public void put() throws IOException{ ProfileDenormalizer.denormalizeProfile(solrProfile); HTableInterface table = HBaseManager.getManager().getTable(TABLE);
Put putCommand = new Put(getRowQualifier()); putCommand.add(COLUMNF, COLUM_QUALIFIER, HBaseSerializers.objectToBytes(solrProfile)); try { table.put(putCommand); } catch (IOException e) { table.flushCommits(); } finally { table.close(); } } 9/13/12 BigTable & Friends public void delete() throws IOException{ HTableInterface table = HBaseManager.getManager().getTable(TABLE); table.delete(new Delete().deleteColumn(COLUMNF, getRowQualifier())); if(!table.isAutoFlush()) table.flushCommits(); table.close(); } Put! Delete! * Hbase serializers are just u>ls to serialize to a Byte[] a Object.

9/13/12 BigTable & Friends public HBaseProfile get(byte[] rowQualifier)
throws IOException{ HTableInterface table = null; try { table = HBaseManager.getManager().getTable(TABLE); Get getCommand = new Get(rowQualifier); getCommand.addFamily(COLUMNF); Result resultSet; resultSet = table.get(getCommand); if (resultSet.isEmpty()) return null; List<KeyValue> keyValues = resultSet.getColumn(COLUMNF, COLUM_QUALIFIER); this.solrProfile = (SolrProfile) … get HBaseManager manager = HBaseManager.getManager(); table = manager.getTable(tableName); Scan scanCommand = new Scan(); scanCommand.setBatch(max); scanCommand.setCaching(max); scanCommand.setMaxVersions(1); table.getScanner(scanCommand) scan ResultScanner Implements iterable<Result>! Result encapsulates a collection of KV! ! ! ! !

9/13/12 BigTable & Friends Table ≅ Map<Byte[], Map<Byte[],Byte[]>>!

Secondary indexing We need to create new indexes (tables) to
point the rows we want to fetch! Denormalization! ! The normal forms (abbrev. NF) of rela>onal database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies You (usually) should take care of the inconsistencies The good point is that! We hate relations! 9/13/12 BigTable & Friends

Design thinking on Queries! I have 700MM rows and 30
TB on my DB and I need to build a new secondary index… 9/13/12 BigTable & Friends

Hbase Secondary indexing Just using contrib packages! ! After several
hours looking for an easy secondary indexing…! DIY 9/13/12 BigTable & Friends

Cassandra Column families and super column families! Built-in secondary indexing
since 0.7! Non atomic counters since 0.7! Deletes aren't so obvious due to the replication and compaction.! Row resurrection.! Eventually persistent (writes in memory)! ! ! 9/13/12 BigTable & Friends

Cassandra’s Comunity 9/13/12 BigTable & Friends

Playing with schemas create column family commits with comparator=UTF8Type and
key_validation_class=TimeUUIDType and column_metadata=[{column_name: forge, validation_class: UTF8Type}, {column_name: revision, validation_class:UTF8Type, index_type:KEYS}, {column_name: repoRevision, validation_class:UTF8Type, index_type:KEYS}, {column_name: message, validation_class:UTF8Type}, {column_name: addedFiles, validation_class:UTF8Type}, {column_name: deletedFiles, validation_class:UTF8Type}, {column_name: modiﬁedFiles, validation_class:UTF8Type}, {column_name: replacedFiles, validation_class:UTF8Type}, {column_name: renamedFiles, validation_class:UTF8Type}, {column_name: repository, validation_class:LongType, index_type:KEYS}, {column_name: user, validation_class:LongType}, {column_name: date, validation_class:LongType}]; create column family project_contributions_stats_scf with column_type='Super'; create column family randoomStuff; 9/13/12 BigTable & Friends

Cassandra client Hector 9/13/12 BigTable & Friends

Mutators Mutator counter = HFactory.createMutator(keyspace, getStringSerializer()); counter.insertCounter("counter", cassandra.CassandraConstants.CF_COMMIT_COUNTER, HFactory.createCounterColumn("commits", increment));
counter.execute(); 9/13/12 BigTable & Friends put Mutator mutator = HFactory.createMutator(keyspace, UUIDSerializer.get()) mutator.addInsertion(key, CF,HFactory.createStringColumn(”foo", foo)) .addInsertion(key, CF, HFactory.createStringColumn(”bar", bar)) .addInsertion(key, CF, HFactory.createStringColumn("message", message)); sMutator.execute(); Mutator mutator = HFactory.createMutator(keyspace, UUIDSerializer.get()) mutator.delete(key, CF, null, CassandraManager.getUUIDSerializer()); delete

// We use bytebuffer because each column is a different
type IndexedSlicesQuery<UUID, String, ByteBuffer> query = HFactory.createIndexedSlicesQuery(CassandraManager.getKeyspace(), CassandraManager.getUUIDSerializer(), StringSerializer.get(), ByteBufferSerializer.get()); query.setColumnNames(Constants.COMMIT_COLUMNS); query.addEqualsExpression(”foo", bs.fromByteBuffer( CassandraManager.getLongSerializer().toByteBuffer(id))); query.setColumnFamily(CF); 9/13/12 BigTable & Friends ﬁnal String[] COLUMNS= {”foo", ”bar", ”baz", ”murrico”}; //foo is indexed query get get(UUID uuid) { SliceQuery<UUID, String, ByteBuffer> result = `` HFactory.createSliceQuery(CassandraManager.getKeyspace(), UUIDSerializer.get(), //key StringSerializer.get(), // comn qualiﬁer ByteBufferSerializer.get()); // data result.setColumnFamily("commits"); result.setKey(TimeUUIDUtils.toUUID(TimeUUIDUtils.asByteArray(uuid))); result.setColumnNames(Constants.COMMIT_COLUMNS); QueryResult <ColumnSlice<String, ByteBuffer>> columnSlice = result.execute(); Query

Cassandra! !better for heavy write applications! !ease of replication (user
just don’t touch anything)! !ease to set up! !Hundreds of column! !Random partitioner instead of ordered tree as bigtable! !More complex schemas due to supercolumns! !You can control everything.! ! Hbase! !read heavy applications ! !Row locking! !MapReduce the data doesn’t travel! !Beter performance on Range scans! !NameNode is a single point of failure! !Less verbose client! ! ! 9/13/12 BigTable & Friends

9/13/12 BigTable & Friends

9/13/12 BigTable & Friends Agradecida Y emocionaada Solamente
puedo decir… GRACIAS POR VENIR

Cachirulo valley Bigtable and friends

Cachirulo valley Bigtable and friends

Juan Luis Belmonte

Other Decks in Programming

Featured

Transcript

9/13/12 BigTable & Friends & friends

9/13/12 BigTable & Friends NoSQL and RDBMs BigTable

NoSQL It’s a DB engine which doesn’t stick to the

NoSQL categories Key value! Graph! Document! ! 9/13/12 BigTable

Abuse of usage of joins has a

SQL teach you to think about the queries.   Which

RDBMs issues when you reach the limit When they were

RDBMs are ROW oriented Table: collection of rows written in

Bigtable overview Column oriented! ! Ordered key value map.! !

BigTable Schema Sorted map Indexed By Row Key Column Key

Design thinking on Queries Are we going to just need

HBase Based on BigTable! ! Relies on HDFS so the

Cassandra Based on Bigtable and Amazon’s Dynamo DB! ! Partitioning

Hbase Column families! byte[] for key! Byte[] for ﬁelds and

9/13/12 BigTable & Friends

Hbase Java native client 9/13/12 BigTable &

public void put() throws IOException{ ProfileDenormalizer.denormalizeProfile(solrProfile); HTableInterface table = HBaseManager.getManager().getTable(TABLE);

9/13/12 BigTable & Friends public HBaseProﬁle get(byte[] rowQualiﬁer)

9/13/12 BigTable & Friends Table ≅ Map<Byte[], Map<Byte[],Byte[]>>!

Secondary indexing We need to create new indexes (tables) to

Design thinking on Queries! I have 700MM rows and 30

Hbase Secondary indexing Just using contrib packages! ! After several

Cassandra Column families and super column families! Built-in secondary indexing

Cassandra’s Comunity 9/13/12 BigTable & Friends

Playing with schemas create column family commits with comparator=UTF8Type and

Cassandra client Hector 9/13/12 BigTable & Friends

Mutators Mutator counter = HFactory.createMutator(keyspace, getStringSerializer()); counter.insertCounter("counter", cassandra.CassandraConstants.CF_COMMIT_COUNTER, HFactory.createCounterColumn("commits", increment));

// We use bytebuffer because each column is a different

Cassandra! !better for heavy write applications! !ease of replication (user

9/13/12 BigTable & Friends

9/13/12 BigTable & Friends Agradecida Y emocionaada Solamente