Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Breaking ALASKA

Breaking ALASKA

Presents Binghamton's team contributions in the ALASKA steganalysis challenge.
This talk has been given at IHMMSEC 2019.

Yassine Yousfi

July 04, 2019
Tweet

More Decks by Yassine Yousfi

Other Decks in Research

Transcript

  1. Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain Yassine

    Yousfi, Jan Butora, Jessica Fridrich, and Quentin Giboulot 1 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  2. ALASKA challenges, the way we saw them Color JPEGs Variable

    payload Multiple stego schemes Variable image size JPEG QFs 60–100 Ordering images instead of hard decisions 2 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  3. Commitments and prior knowledge SRNet as a leading CNN architecture

    [Boroumand TIFS’18], adapted to color using a 3-channel first convolutional layer Training a “Tile Detector” and using it as a feature extractor to steganalyze arbitrary size images [Fuji Tsang EI’18] Multiclass detectors perform the best when dealing with diversified stego sources [Butora EI’19] Training detectors for each JPEG quality factor 3 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  4. Datasets For each JPEG quality factor 256×256 “tiles” Arbitrary sized

    images Base payload and double payload TRN / VAL / TST: 42,000 / 3,500 / 3,500 mixTST: 3,500 images, our “replica” of the final test set 4 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  5. Winning architecture, QF ≤ 98 Multiclass SRNet ⎛ ⎝ ⎜

    ⎜ ⎜ ⎜ ⎜ ⎜ 1 2 3 4 ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Global average (512) Global variance (512) Global m inim um (512) Global m axim um (512) Concatenated Features (512x4x5) Hidden Layer (2x512x4x5) Hidden Layer (2x512x4x5) Concatenate 256x256 Tile Detector Arbitrary Size Detector ( ∗ ) DCT −1 YCrCb Y Cr Cb CrCb ⋯ ⋯ ⋯ ⋯ 5 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  6. Color separation is the main trick Table: Color separation boost

    for QF95 ARBITRARYbase Architecture MD5 PE Y CrCb-SRNet 48.13 24.51 Color separated SRNet 38.31 19.25 Merging colors in 1st layer using Y Cr Cb -SRNet appears sub-optimal Incorrect spread of payload among Y and Cr , Cb in the embedding script may have affected the boost Color separation provides an even bigger boost when using the correct payload spread [Taburet IWDW’19] 6 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  7. Multiclass vs binary results confirmed for JPEG domain [Butora EI’19]

    results on diversified stego source are extended to JPEG domain steganography Bigger batch size gives better performance when facing diversified sources (batch size 64) Table: Y Cr Cb -SRNet trained as binary and multi-class for QF75 on TILEbase MD5 PE Binary 11.41 8.10 Multiclass 9.60 7.13 7 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  8. Arbitrary sized images are steganalyzed using SRNets as feature extractors

    [Fuji Tsang EI’18] results using modified YeNet [Ye TIFS’17] are extended to SRNet for JPEG domain steganography 4 “moments”: Mean, Variance, Minimum, and Maximum 2 hidden layers MLP with size (2 x input, 2 x input): non-linear decisions 8 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  9. Winning architecture, QF 99: The reverse JPEG compatibility attack Binary

    SRNet ( ) Global average (512) 256x256 Tile Detector Arbitrary Size Detector − [ ] ( ∗ ) DCT −1 Y 9 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  10. The reverse JPEG compatibility attack Rounding error eij = zij

    − [zij ] follows N(0, s) “folded” to the interval [−1/2, 1/2]: ν(x; s) = 1 √ 2πs n∈Z exp − (x + n)2 2s . (1) For QF100, the variance s = 1/12 Steganographic embedding increases s 10 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  11. The reverse JPEG compatibility attack −0.5 0.0 0.5 0.6 0.7

    0.8 0.9 1.0 1.1 1.2 1.3 1.4 1/12 0.1 0.15 0.2 Figure: Folded Gaussian distribution ν(x; s) for noise variance in the DCT domain s = 1/12, 0.1, 0.15, 0.2. Note how rapidly ν(x; s) converges to a uniform distribution with increased s 11 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  12. The reverse JPEG compatibility attack Near perfect detector for QF100

    and very good performance for QF99 Helped us confirm that ALASKArank had only 10% of stego images Robust to different JPEG compressors Can detect arbitrary steganography and small payloads More details in J. Butora, J. Fridrich: “The Reverse JPEG Compatibility Attack” IEEE TIFS, 2019, under review Table: Reverse JPEG compatibility attack performance on ARBITRARYbase PE QF100 1.00 QF99 6.00 12 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  13. Ordering ps computed using different detectors: Calibration Comparing soft-outputs from

    different detectors should be done with extreme caution Essential property of a probability estimate: being a representative of the true correctness likelihood (calibration) Soft-outputs from deep nets, such as SRNet, often lack this property1 Shallow networks (MLP) typically well calibrated Arbitrary size SRNet is essentially a shallow network trained on a set of features: well calibrated 1Guo, Chuan, et al. "On calibration of modern neural networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017. 13 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  14. Ordering ps computed using different detectors: Calibration Figure: Calibration plot

    for the tile detector and the arbitrary size detector for JPEG quality 95 14 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  15. Final results Cover source mismatch impact: 400 double JPEG compressed

    images ([Cogranne IH’19]) Table: Final scores on mixTST and ALASKArank and other competitors scores MD5 PE FA50 mixTST 18.55 11.50 0.09 ALASKArank 25.2 14.48 0.71 2nd competitor 51.60 25.20 5.86 15 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  16. Bag of tricks: Curriculum Learning Two types of CL have

    been used in Alaska Payload CL JPEG quality factor CL ([Butora EI’19]) Table: Tile detector performance on TILEbase with and without payload curriculum learning (payload) for quality factor 75 and 95 Without CL With CL MD5 PE MD5 PE QF75 15.69 10.09 9.60 7.12 QF95 95.00 50.00 13.80 9.29 16 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  17. Bag of tricks, cont’d Warm start learning rate Using a

    small learning rate (10−4 ) for a few iterations (around 20,000) before the original learning rate schedule described in [Boroumand TIFS’18] Stabilizes training and helps convergence Image augmentation at prediction Augmenting each test image with its rotations and flips and averaging the soft-outputs of the detector over all transformations Boosts by 1-2% in MD5 17 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  18. Low false alarm performance of SRNet 18 / 20 Breaking

    ALASKA: Color Separation for Steganalysis in JPEG Domain
  19. Looking back Alaska was very challenging and gave birth to

    interesting research and discoveries “Wilder” than BOSS but still far from the real world Unrealistically noisy images (due to excessive sharpening and micro contrast ...) > Alaska v2? 19 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain
  20. Future research and open questions Is there a better way

    to perform color steganalysis (or more generally, multi-channel steganalysis)? How to make our approach more scalable to Unseen stego schemes Unseen cover processing, double JPEG compression, custom JPEG quantization, etc.? 20 / 20 Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain