Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SEE: The Hidden Cloud Tax Breaker — Schema-Awar...

SEE: The Hidden Cloud Tax Breaker — Schema-Aware Compression Beyond Zstd

Schema-aware compression that keeps JSON searchable while compressed.
19.5% of raw, p50 ≈ 0.18 ms, skip ≈ 99%.
SEE cuts I/O + CPU cost, saving $7B+/year with ROI ≈ 11,000%.
Learn more → https://github.com/kodomonocch1/see_proto

Avatar for kodomonocch1

kodomonocch1

October 11, 2025
Tweet

Other Decks in Technology

Transcript

  1. SEE Schema-Aware Compression Beyond Zstd The Hidden Cloud Tax Breaker

    From Bytes to Balance Sheets  github.com/kodomonocch1/see_proto
  2. Problem: The Hidden Cloud Tax  Cloud Storage  Data

    Flow $ Egress = $0.05/GB   Every byte leaving the cloud is taxed 1 EB/month = $50M/month
  3. FinOps Pressure Cloud Cost = Egress + CPU Egress (Data

    Transfer) CPU (Decompression) Compression reduces storage, not compute.
  4. Why gzip/Zstd Are Not Enough Storage savings ≠ Query performance

        Decompress Parse Filter Aggregate High CPU & I/O overhead at every query   
  5. Breakthrough: Schema-Aware Compression SEE understands structure, not just bytes. JSON

    Schema  Δ Encoding  Zstd  PageDir  Bloom Filter  Mini- Index Structure × Δ × Zstd + Bloom = Searchable Compression Partial Decode / Direct Access
  6. Architecture Overview Traditional Compression Monolithic Compressed Block SEE Compression Schema

    Layer Δ Codec Bloom Filter PageDir Zstd Dictionary Selective access replaces full decode
  7. Performance Metrics Real-World Performance (GitHub Dataset) Metric SEE Zstd Combined

    ratio 0.194 0.137 Lookup p50 (ms) 0.18 n/a Skip rate 0.99 0 SEE trades 5–10% size for 90% fewer I/O operations
  8. Economic Impact: From Technical Gain to Financial Impact Payback <

    4 days • ROI ≈ 11,000% • 100 EB/month scale SEE pays back faster than cloud billing cycles.
  9. Market Necessity SEE is Not Optional — It's Inevitable. 

    FinOps Optimization  Cloud API Integration  Sustainability Shield Adoption risk > Implementation cost
  10. Demo and Reproducibility $ python samples/quick_demo.py Latency Performance p50 0.18ms

    p95 0.42ms p99 0.89ms Scan for GitHub Repository Access Reproducible on any environment in 10 minutes 
  11. From Bytes to Balance Sheets    SEE $

    SEE transforms compression from a technical layer into an economic engine. Available under NDA — request access.  github.com/kodomonocch1/see_proto  Request Access Form