Upgrade to Pro — share decks privately, control downloads, hide ads and more …

No SQL

No SQL

Andy Cheng

August 08, 2013
Tweet

More Decks by Andy Cheng

Other Decks in Programming

Transcript

  1. Trend 1: 資料量的成長 Twitter at 400M Tweets Per Day in

    Jun’ 12 Facebook at 1B Users in Oct’ 12 1億用戶數4.5年(1665天)
  2. Trend 3: “半結構化資料”分析的需求 • Semi-structured data o Data that does

    not reside in fixed fields but use tags or other markers to capture elements of the data • 例如: Logs、GPS 座標、XML、JSON、網頁 內容(HTTP Tagged text)
  3. SQL的限制 • Table Lock (Transaction) • Join, Join, Join (正規化)

    • Schema Change (資料型態/長度) • Operational Maintenance(備份…) • Disaster Recovery(Cluster/Log shipping) • Scaleout
  4. • 不是Relational Database • 沒有Schema (Schema-free) • 沒有Join • 沒有正規化

    • 是另一種的資料儲存方式 • 分散式架構 • Open Source • 最終一致性 Eventually Consistency No SQL
  5. CAP定律 • Consistency: 一致性 所有節點在同一時間都具有相同的數據 • Availability: 可用性 保證每個請求不管成功或者失敗都有響應 •

    Partition Tolerance: 分隔容忍 系統中任意訊息的失敗不會影響系統的繼續運作 理論上無法同時兼顧CAP三種特性, NoSQL資料庫通常會選擇其中兩種特 性來設計,通常是選擇CP或AP
  6. { “_id” : 1001, “customer_id”: 98234 } { “_id” :

    1002, “customer_id”: 98311, “discount”: “Y” } _id: 1001 _id: 1002 Key Value Table: Customer
  7. { “_id” : 1001, “customer_id”: 7231, “line_items” : [ {“product_id”:

    4555, “quantity”: 8}, {“product_id”: 7655, “quantity”: 4}, {“product_id”: 8755, “quantity”: 3}] } { “_id” : 1002, “customer_id”: 98311, “line_items” : [ {“product_id”: 4555, “quantity”: 3}, {“product_id”: 2155, “quantity”: 4}], “discount”: “Y” } Key Document _id: 1001 _id: 1002 Table: Customer
  8. Graph Connie Ma Max Cheng Mahendra Negi Eva Chen Report

    To Report To IS Function Function Jo Ma Claudia Wu Terry Huang Alex Kuo Report To Report To Report To Report To Report To HR
  9. 參考資料 • Martin Fowler: NoSQL Distilled to an Hour http://vimeo.com/66052102

    • http://nosql.org.tw/ • http://blog.nosqlfan.com/ • http://www.csdn.net/article/tag/nosql
  10. MongoDB • JSON like document store (BSON) • Embeded, Referenced

    • Sharding:將數據水平切分到不同物理節點 • Replica Set:數據同步 • Mongo Query Language • HTTP / REST API • GridFS:文件存儲API • MapReduce:可以進行複雜的統計和並行計算