Slide 1

Slide 1 text

Automated Regression Test Environment For Multiple Kernels 27 Oct, 2011 Yoshitake Kobayashi Advanced Software Technology Group Corporate Software Engineering Center TOSHIBA CORPORATION Copyright 2011, Toshiba Corporation.

Slide 2

Slide 2 text

Serious pain in the neck There are many products that needs to supply more than 10 years…. 2 Yoshitake Kobayashi - LinuxCon Europe 2011 -

Slide 3

Slide 3 text

Actually what happens Libraries Applications Test cases Simple example for a product. 3 Yoshitake Kobayashi - LinuxCon Europe 2011 - Hardware Kernel Libraries

Slide 4

Slide 4 text

Got a problem Libraries Applications Test cases Libraries Applications Test cases some years later Product life cycle: more than 10 years 4 Yoshitake Kobayashi - LinuxCon Europe 2011 - Hardware Kernel Libraries Kernel Libraries Oh! No!

Slide 5

Slide 5 text

What we can do? Is new hardware introduction enough? No. Need to check supported hardware for the current kernel. But new hardware is not supported by such old one. Libraries Applications Test cases To overcome the hardware discontinuation: 5 Yoshitake Kobayashi - LinuxCon Europe 2011 - Kernel Libraries Hardware Newer Hardware Kernel Oh! No! Quite new Hardware or approx 5 years: about 20 versions

Slide 6

Slide 6 text

To adapt to brand new hardware There are two possible approaches Backport upstream kernel drivers to the old kernel Change the current product’s kernel to newer one Which is better? I’m not sure. The correct answer depends on: Hardware specifications 6 Yoshitake Kobayashi - LinuxCon Europe 2011 - Hardware specifications User’s (or Programmer’s) requirements This presentation focuses to migrate to newer kernel to a troubling but actually important product

Slide 7

Slide 7 text

Goal Pickup more than one suitable version for kernel migration Estimate the side effects of kernel migration But, only for the estimation in this presentation 7 Yoshitake Kobayashi - LinuxCon Europe 2011 -

Slide 8

Slide 8 text

To pickup suitable kernel Need tests! API level Systemcalls Performance level (Throughput and/or QoS) The correct metrics is difficult to define. Depends on applications Good performance is not usually required, but required 8 Yoshitake Kobayashi - LinuxCon Europe 2011 - Good performance is not usually required, but required performance must be ensured Each regression test takes long time… A couple of hours to weeks. If multiple tests are needed….. Time between one test and the other one is quite BIG overhead.

Slide 9

Slide 9 text

The BIGGEST overhead Forget to check the result even the test had already finished Forgetfulness is something everyone experiences… 9 Yoshitake Kobayashi - LinuxCon Europe 2011 -

Slide 10

Slide 10 text

Possible solutions Keep a look I need a sleep Set a reminder Difficult to estimate the finish time 10 Yoshitake Kobayashi - LinuxCon Europe 2011 - Difficult to estimate the finish time Automated test is good idea to avoid wasting time

Slide 11

Slide 11 text

Test environment (Simple case) Host repository Public Repositories: 1. Linux kernels 2. Kernel Configurations 3. Test tools (need to be installed before beginning) Installed Software: 1. Compilers 2. SSH and rsync (A set of key is created with empty passphrase) 11 Yoshitake Kobayashi - LinuxCon Europe 2011 - Target Directly connected by cross cable Private (A set of key is created with empty passphrase) 3. Test tools (server side) Installed Software: 1. Compilers 2. SSH and rsync (Public key is registered) 3. Test tools (client side)

Slide 12

Slide 12 text

Test environment (Reasonable case) Host repository Public Repositories: 1. Linux kernels 2. Kernel Configurations 3. Test tools (need to be installed before beginning) Installed Software: 1. Compilers 2. SSH and rsync (A set of key is created with empty passphrase) 12 Yoshitake Kobayashi - LinuxCon Europe 2011 - Target Private Current HW Candidate (A set of key is created with empty passphrase) 3. Test tools (server side) Installed Software: 1. Compilers 2. SSH and rsync (Public key is registered) 3. Test tools (client side)

Slide 13

Slide 13 text

Test environment (Actual case) Host repository Public 13 Yoshitake Kobayashi - LinuxCon Europe 2011 - Target Private Current HW Candidate1 Candidate2 Candidate3

Slide 14

Slide 14 text

Test environment Host repository Public Repositories: 1. Linux kernels 2. Kernel Configurations 3. Test tools (need to be installed before beginning) Installed Software: 1. Compilers 2. SSH and rsync (A set of key is created with empty passphrase) 14 Yoshitake Kobayashi - LinuxCon Europe 2011 - Target Private Current HW Candidate (A set of key is created with empty passphrase) 3. Test tools (server side) Installed Software: 1. Compilers 2. SSH and rsync (Public key is registered) 3. Test tools (client side)

Slide 15

Slide 15 text

Assumptions for testing environment Userland software have to be same Only the configuration allowed to be change. ex. hda -> sda Kernel Libraries Test cases Kernel Kernel Libraries (lenny) Testcases’ Libraries (squeeze) Testcases’’ Libraries Test cases Libraries Test cases 15 Yoshitake Kobayashi - LinuxCon Europe 2011 - All kernels have already configured for current and new hardware All test cases need to be compiled by same tools Each test case has different setup instruction Kernel (2.6.18-etch) Hardware Kernel (2.6.26-lenny) New Hardware Kernel (2.6.32-squeeze) New Hardware

Slide 16

Slide 16 text

Simple rules for automation Test scenario includes the following definitions: Kernels to be tested Need to define which kernel using on current hardware Test cases Each test case has the following phases: 1. Setup (only if needed) 2. Execution 16 Yoshitake Kobayashi - LinuxCon Europe 2011 - 2. Execution 3. Collection of test results 4. Analysis of test results

Slide 17

Slide 17 text

Test scenario start checkout and deploy kernel source and configuration checkout and deploy start build and install kernel build and install test tools Host Target 17 Yoshitake Kobayashi - LinuxCon Europe 2011 - end collect test results test tools end do all test test tools build and install next kernel next? activate a collaboration process next?

Slide 18

Slide 18 text

Supported tests (Currently planned) API test LTP Performance test iozone QoS test Cyclictest with workload (cyclictest-wl) File system data reliability test (fs-test) 18 Yoshitake Kobayashi - LinuxCon Europe 2011 - File system data reliability test (fs-test) Network latency test (Acceleration test is not supported yet)

Slide 19

Slide 19 text

Architecture of testing environment common tests config.pm kernels run autotest.pl setup setup 19 Yoshitake Kobayashi - LinuxCon Europe 2011 - LTP fs-test copy result kernels analysis setup, run, copy, analysis… . . . . . .

Slide 20

Slide 20 text

Common subroutines Located in “common” Defined default behavior for the following subroutines Checking out kernel and test case source code from git repository Test case preparation Test case execution Copy the test result 20 Yoshitake Kobayashi - LinuxCon Europe 2011 - common tests LTP config.pm copy result kernels run autotest.pl setup analysis setup

Slide 21

Slide 21 text

Example of configuration file (config.pm) Defines all kernels and test cases Kernel part %kernel_data = ( 2.6.18 => { name => "2.6.18", repository_url => ‘PATH/TO/kernel/linux-dev.git', branch => 'v2.6.18-stable', config => “myconf-stable.cfg", target_host => '[email protected]', 21 Yoshitake Kobayashi - LinuxCon Europe 2011 - target_host => '[email protected]', }, 2.6.26 => { name => "2.6.26", repository_url => ‘PATH/TO/kernel/linux-dev2.git', branch => 'v2.6.26-dev', config => “myconf-dev.cfg", target_host => '[email protected]', }, …. common tests LTP config.pm copy result kernels run autotest.pl setup analysis setup

Slide 22

Slide 22 text

Example of configuration Test part %test_data = ( # LTP ltp => { name => 'ltp', repository_url => ‘PATH/TO/ltp.git', branch => 'master', }, fstest => { 22 Yoshitake Kobayashi - LinuxCon Europe 2011 - fstest => { name => ‘fstest', repository_url => ‘PATH/TO/fstest.git', branch => 'master', }, ….. ); common tests LTP config.pm copy result kernels run autotest.pl setup analysis setup

Slide 23

Slide 23 text

autotest.pl This script invokes scripts defined in configuration file 23 Yoshitake Kobayashi - LinuxCon Europe 2011 - common tests LTP config.pm copy result kernels run autotest.pl setup analysis setup

Slide 24

Slide 24 text

Setup script Setup script has to execute the following things: For kernel Apply patches if needed Build and install kernel For test case Apply patches if needed Build and install test case 24 Yoshitake Kobayashi - LinuxCon Europe 2011 - Build and install test case common tests LTP config.pm copy result kernels run autotest.pl setup analysis setup

Slide 25

Slide 25 text

Run scrpit run.sh script does the following Excute a test case 25 Yoshitake Kobayashi - LinuxCon Europe 2011 - common tests LTP config.pm copy result kernels run autotest.pl setup analysis setup

Slide 26

Slide 26 text

Copy results script Copy a test case’s output from target side to host side On host side there may also be some other results (depends on test case) Host LOG LOG_H LOG_T LOG_H 26 Yoshitake Kobayashi - LinuxCon Europe 2011 - common tests LTP config.pm copy result kernels run autotest.pl setup analysis setup Target LOG_T

Slide 27

Slide 27 text

Target files Overview of Filesystem data reliability test Writer processes (N procs) Target write() system call 27 Yoshitake Kobayashi - LinuxCon Europe 2011 - write() system call Host Logger Each writer process • writes to text files (ex. 100 files) • sends progress log to logger

Slide 28

Slide 28 text

Analysis script Compare the test results between 2 kernel versions Simple case is simply with using “diff” command 28 Yoshitake Kobayashi - LinuxCon Europe 2011 - common tests LTP config.pm copy result kernels run autotest.pl setup analysis setup

Slide 29

Slide 29 text

Test results The following slides describe the actual differences that we got in test cases. 29 Yoshitake Kobayashi - LinuxCon Europe 2011 -

Slide 30

Slide 30 text

LTP results Userland : Debian 4.0 (etch) Kernel1: 2.6.18-etch Error count: 1 cron02 Kernel2: 2.6.26-lenny Error count: 3 getcpu01, stime01, cron02 Details of above difference 30 Yoshitake Kobayashi - LinuxCon Europe 2011 - Details of above difference getcpu01 Only runs >2.6.20 Need NUMA support stime01 time() retuens stime()-1 A bug fix is available on 2.6.27.13 Easy to fix

Slide 31

Slide 31 text

QoS verification Metrics Scheduling: Latency of scheduling File system: Data error ratio Networking: Packet loss ratio I/O: Throughput 31 Yoshitake Kobayashi - LinuxCon Europe 2011 -

Slide 32

Slide 32 text

Latency test (cycle 300µs / cpu and memory load) CPU load 0 % 50 % 2.6.31.12 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 60000 70000 80000 90000 100000 2.6.31.12-RT 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 60000 70000 80000 90000 100000 counts counts counts counts latency [microsecs] latency [microsecs] 32 Yoshitake Kobayashi - LinuxCon Europe 2011 - 50 % 100 % 0 10000 20000 30000 40000 50000 60000 0 200 400 600 800 1000 latency(μ秒) 回数 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 0 10000 20000 30000 40000 50000 60000 0 200 400 600 800 1000 latency(μ秒) 回数 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 counts counts counts counts latency [microsecs] latency [microsecs] latency [microsecs] latency [microsecs]

Slide 33

Slide 33 text

60000 70000 80000 90000 100000 60000 70000 80000 90000 100000 Latency test(cycle 300µs / CPU load only) CPU load 0 % 50 % 2.6.31.12 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 2.6.31.12-RT 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 33 Yoshitake Kobayashi - LinuxCon Europe 2011 - 0 10000 20000 30000 40000 50000 60000 0 200 400 600 800 1000 latency(μ秒) 回数 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 0 10000 20000 30000 40000 50000 60000 0 200 400 600 800 1000 latency(μ秒) 回数 50 % 100 %

Slide 34

Slide 34 text

Why this happens? Probably it depends on hardware Try to find the bottlenecks by ftrace This latency problem randomly happens in the kernel If the same test runs on other machines, nothing happened Probably SMB interrupts caused the problem In this case, just throw away the hardware 34 Yoshitake Kobayashi - LinuxCon Europe 2011 - In this case, just throw away the hardware

Slide 35

Slide 35 text

Results of data reliability tests 0.00 0.50 1.00 1.50 2.00 EXT3- ORDERED EXT3- JOURNAL * * * JFS XFS Error rate [%] 45.9% Error rate [%] 1.00 1.50 2.00 13.3% kernel 2.6.18 kernel 2.6.31 File size mismatch rate Data mismatch rate Point 1 : : : : An filesystem has different characteristics on different kernel Point 2: kernel version 35 Yoshitake Kobayashi - LinuxCon Europe 2011 - Error rate [%] 0.00 0.50 1.00 1.50 2.00 EXT3- ORDERED EXT3- JOURNAL EXT4- JOURNAL EXT4- ORDERED EXT4- WRITEBACK * XFS BTRFS 82.4% 84.7% 43.4% 41.4% 43.2% Error rate [%] 0.00 0.50 EXT3- ORDERED EXT3- JOURNAL EXT4- JOURNAL EXT4- ORDERED * JFS XFS 2.6.31 kernel 2.6.33 Point 2: 2.6.33 has high error rate on ordered and writeback mode Point 3: Ext4-journal and Btrfs has good resluts File system types kernel version

Slide 36

Slide 36 text

Conclusion Kernel migration is one of major problem for long-term product supply Automated test is effective way For estimation to pick kernel To avoid wasting time Still we need to work a lot on this problem. 36 Yoshitake Kobayashi - LinuxCon Europe 2011 - Still we need to work a lot on this problem. Who wants to have it?

Slide 37

Slide 37 text

Thank you 37 Yoshitake Kobayashi - LinuxCon Europe 2011 - Thank you