SHADE with Iterative Local Search for Large-Scale Global Optimization

SHADE with Iterative Local Search for Large-Scale Global Optimization Daniel
Molina1 Antonio LaTorre2 Francisco Herrera1 1 University of Granada, Spain 2 Universidad Politécnica de Madrid, Spain

Contents 1 Introduction 2 SHADE-ILS 3 Results and comparisons

Large Scale Optimization Problem Optimization problems Global Optimization f (x∗)
≤ f (x) ∀x ∈ Domain Real-parameter Optimization Domain ⊆ D, x∗ = [x1, x2, · · · , xD] Large Scale Global Optimization (LSGO) when D ≥ 1000. Domain Search Increase exponentially with dimension.

Interest of LSGO problems Real-word problems Optimization of many parameters.
Scalability of algorithms Algorithms can scale? Specic scalable algorithms for optimization? Study separability of variable Several variable could have a stronger inuence. Completely separable/unseparable are unusual. Dierent degree of separability

MOS current state-of-art Features Combine several evolutionary algorithms and Local
Seachs. Apply each one with an adaptive probability. Include specic local search methods for high-dimensionality. Disadvantages Complex: 9 components. Several components only useful for few functions. Many parameters: it requires a lot of tuning. Citation Antonio LaTorre, Santiago Muelas, José María Peña:A comprehensive comparison of large scale global optimizers. Inf. Sci. 316: 517-549 (2015)

Previous proposal: IHDELS Previous proposal (CEC'2015) Iterative Hybridation DE with
LS: IHDELS. Reference Daniel Molina, Francisco Herrera: Iterative hybridization of DE with Local Search for the CEC'2015 special session on large scale global optimization. Proceeding on IEEE CEC 2015: 1974-1978 (2015)

IHDELS problems IHDELS Problems Complex self-adaptation Restart useless We design
the proposal Simplifying the algorithm Improving results at the same time

SHADE-ILS Memetic Algorithm Evolutionary Algorithm (DE) for diversity. Local Search
to enforce exploitation. Evolutionary Algorithm To explore eciently. Few parameters. Local Search Particularly good for high-dimension problems. Robust.

Components: Evolutionary Algorithm Diferencial Evolution: SHADE Parameters F and CR
are adapted: F and CR values from a distribution and a mean. self-adapt means by the tness obtained. Movement is guided by best solutions. Scalable, exploring well the domain search. No reduction of population, enforce exploitation through LS. Changes from IHDELS IHDELS used SaDE. Dierent mutation operator. Dierent parameters adaptation.

Components: Local Search Method Local Search Methods MTS-LS1, specic method
for large-scale optimization. Well-known L-BFGS-B for more robustness. Combination Both are complementary. Each iteration apply only one of them. Adaptative method to select which one apply each iteration. Changes from IHDELS Selection method completely dierent.

Why these two? They are complementary MTS-LS1 Explore dimension by
dimension, quick. Very sensitive to coordinate system. L-BFGS-B Guided by gradient estimation. Not Sensitive to coordinate system.

Global Scheme Init Population Get best solution Get best solution
Maxevals? Must restart? Choose LS by beﬆ improvement Apply LS to best one for FE DE evals Apply SHADE for FE LS evals Last improvement ratio for each LS Restart population Update improvement ratio Restart Step No Yes Yes No

Component: SHADE Init Population Get best solution Get best solution

SHADE: Mutation Initialization Mutation Crossover Selection Strategy mutation ui =
xi + Fi · (xpbest − xi) + Fi · (xr1 − ar2) xpbest random individual chosen from pbest best individuals in population. xr1 Random solution from population. ar2 random solution from Population ∪ A. Fi randomly obtained following normal distribution with FmeanK mean.

SHADE: Crossover Initialization Mutation Crossover Selection Crossover ui = vi
si rand[0,1] < CRi xi otherside CR ← distribution with mean CRmeank. CRmeank is randomly obtained from memory.

SHADE: Selection Initialization Mutation Crossover Selection Selection xt+ 1 i
= ui if ui improves xt i xt i otherside Update of CR and F means Periodically updates F and CR means. It stores several means in a memory (more diversity).

LS Method Init Population Get best solution Get best solution

Method of Local Search MTS-LS1 Dene a random group of
variable to improve C. Dene SR = 10% local search ratio. Update solution x for Istr evals, ∀i ∈ C. 1 x_i' ← x_i + SR. 2 si tness(x ) ≤ tness(x) x ← x , go to step 1. 3 otherwise x'_i ← x_i - 0.5 · SR. 4 if tness(x ) ≤ tness(x) x ← x , go to step 1. 5 Reduce SR (SR ← SR/2), if x did not improve. L-BFGS-B Well-known mathematical method. Approximate gradient in each point.

Selection of the Local Search Init Population Get best solution
Get best solution Maxevals? Must restart? Choose LS by beﬆ improvement Apply LS to best one for FE DE evals Apply SHADE for FE LS evals Last improvement ratio for each LS Restart population Update improvement ratio Restart Step No Yes Yes No

LS Selection IHDELS applied an adaptive probability. Initial Probability PLSM
= 1 |LS| ∀M ∈ LS Update (after FreqLS evaluations) PLSM = ILSM m∈ LS ILSm ILSM = FreqLS i= 1 Improvement LSM

Local Search selection Problems Complex to calculate. Be careful with
minimum probabilities. For avoiding chosing always the same. SHADE-ILS Model 1 Apply each time each LS, obtain the improvement ratio. 2 Store for each LS method that ratio: RatioLSM = Fitnessold − Fitnessnew Fitnessold 1 In each iteration it applies the LS with best last improvement ratio.

Example Example 1 Apply MTS-LS1 ⇒ RatioMTS = 0.9. 2
Apply L-BFGS-B ⇒ Ratiobfgs = 0.8. 3 Apply MTS-LS1 ⇒ RatioMTS = 0.85. 4 Apply MTS-LS1 ⇒ RatioMTS = 0.78. 5 Apply L-BFGS-B ⇒ Ratiobfgs =... 6 ... Advantages A lot easier. Better results.

Restart mechanism IHDELS Model There is not improvement in a
iteration. It was almost never applied, never improve results. SHADEILS Model It counts when it does not achieve a minimum ratio of improvement (1%). Restart when after 3 iterations never improve that 1%. Partial restart for each iteration LS not improving ⇒ restart the LS parameters (SR). SHADE not improving ⇒ randomly restart population.

Benchmark About the benchmark 15 functions with dimension 1000. Dierent
group of variables: Fully Separable: F1 − F3. Partially Separable: F4 − F11. Overlapping functions: F12 − F14. Non-separable: F15. Run during 3 · 106 evaluations. Measured the results at dierent evaluation numbers: 1.25 · 105 (4%), 6 · 105 (20%), 3 · 106 (100%).

SHADEILS Parameters SHADEILS Parameter values Parameter Description Value DE popsize
Population size 100 FEDE Evaluations for each DE run 25000 FELS Evaluations for each LS run 25000 MTSSR Initial step size for MTS-LS1 10% Restartmin Minimum improvement 1% Iteranoimprov Iterations without improvement 3

Inuence of the dierent components Func. Using SHADE Using SaDE
Using SHADE IHDELS +New Restart +New Restart +Old Restart F 1 2.69e-24 1.21e-24 1.76e-28 4.80e-29 F 2 1.00e+03 1.26e+03 1.40e+03 1.27e+03 F 3 2.01e+01 2.01e+01 2.01e+01 2.00e+01 F 4 1.48e+08 1.58e+08 2.99e+08 3.09e+08 F 5 1.39e+06 3.07e+06 1.76e+06 9.68e+06 F 6 1.02e+06 1.03e+06 1.03e+06 1.03e+06 F 7 7.41e+01 8.35e+01 2.44e+02 3.18e+04 F 8 3.17e+11 3.59e+11 8.55e+11 1.36e+12 F 9 1.64e+08 2.48e+08 2.09e+08 7.12e+08 F 10 9.18e+07 9.19e+07 9.25e+07 9.19e+07 F 11 5.11e+05 4.76e+05 5.20e+05 9.87e+06 F 12 6.18e+01 1.10e+02 3.42e+02 5.16e+02 F 13 1.00e+05 1.34e+05 9.61e+05 4.02e+06 F 14 5.76e+06 6.14e+06 7.40e+06 1.48e+07 F 15 6.25e+05 8.69e+05 1.01e+06 3.13e+06 Better 12 1 0 2

Inuence of the dierent components Dierences by the new restart
mechanism (using TACO). Evaluations Mean Error Function: 04 New Restart Old Restart 1.20e+5 3.00e+6 6.00e+5 1.00e+8 1.00e+9 1.00e+10 1.00e+11 Highcharts.com Evaluations Mean Error Function: 09 1.20e+5 3.00e+6 6.00e+5 1.50e+8 1.75e+8 2.00e+8 2.25e+8 2.50e+8 2.75e+8 3.00e+8 Evaluations Mean Error Function: 05 New Restart Old Restart 1.20e+5 3.00e+6 6.00e+5 1.25e+6 1.50e+6 1.75e+6 2.00e+6 2.25e+6 2.50e+6 2.75e+6 Highcharts.com Evaluations Mean Error Function: 12 1.20e+5 3.00e+6 6.00e+5 4.00e+1 1.00e+2 2.00e+2 4.00e+2 1.00e+3 2.00e+3 4.00e+3

Comparison against MOS MaxEvals = 1.25 · 105 (4%) Functions
SHADEILS MOS F 1 6.10e+04 2.71e+07 F 2 2.65e+03 2.64e+03 F 3 2.03e+01 7.85e+00 F 4 3.13e+10 3.47e+10 F 5 2.50e+06 6.96e+06 F 6 1.05e+06 3.11e+05 F 7 3.95e+08 3.46e+08 F 8 2.12e+14 3.72e+14 F 9 2.88e+08 4.29e+08 F 10 9.43e+07 1.16e+06 F 11 6.55e+09 3.13e+09 F 12 2.67e+03 1.16e+04 F 13 1.29e+10 8.37e+09 F 14 1.62e+11 4.61e+10 F 15 9.12e+07 1.45e+07 Best 6 9

Comparison against MOS MaxEvals = 6 · 105 (20%) Functions
SHADEILS MOS F 1 3.71e-23 3.48e+00 F 2 1.80e+03 1.78e+03 F 3 2.01e+01 1.33e-10 F 4 1.54e+09 2.56e+09 F 5 2.29e+06 6.95e+06 F 6 1.04e+06 1.48e+05 F 7 9.25e+05 8.19e+06 F 8 6.93e+12 8.41e+13 F 9 2.50e+08 3.84e+08 F 10 9.29e+07 9.03e+05 F 11 1.37e+08 8.05e+08 F 12 1.28e+03 2.20e+03 F 13 5.68e+07 8.10e+08 F 14 6.97e+07 2.03e+08 F 15 1.22e+07 6.26e+06 Best 10 5

Comparison against MOS MaxEvals = 3 · 106 (100%) Functions
SHADEILS MOS F 1 2.69e-24 0.00e+00 F 2 1.00e+03 8.32e+02 F 3 2.01e+01 9.17e-13 F 4 1.48e+08 1.74e+08 F 5 1.39e+06 6.94e+06 F 6 1.02e+06 1.48e+05 F 7 7.41e+01 1.62e+04 F 8 3.17e+11 8.00e+12 F 9 1.64e+08 3.83e+08 F 10 9.18e+07 9.02e+05 F 11 5.11e+05 5.22e+07 F 12 6.18e+01 2.47e+02 F 13 1.00e+05 3.40e+06 F 14 5.76e+06 2.56e+07 F 15 6.25e+05 2.35e+06 Best 10 5

Comparison against MOS Considering the number of evaluations Alg 4%
20% 100% MOS 9 5 5 SHADEILS 6 10 10 Conclusions about SHADEILS Since 20% it obtains better results. Better in more complex functions. Worse in separable ones. Very competitive in overlapping/non-separable functions.

Improvement in more complex functions Ratio of Improvement Fun MOS
⇒ SHADEILS 4% 20% 100% F7 1.6e+4 ⇒ 7.4e+1 -12.7% 88.6% 99.4% F8 8e+12 ⇒ 3.1e+11 53.8% 91.8% 96.0% F9 3.8e+8 ⇒ 1.6e+8 23.3% 34.9% 57.2% F10 9.0e+5 ⇒ 9.2e+7 -95.4% -99.0% -99.0% F11 5.2e+7 ⇒ 5.1e+5 -45.8% 83.0% 99.0% F12 2.5e+2 ⇒ 6.2e+1 77.6% 41.8% 74.9% F13 3.4e+6 ⇒ 1.0e+5 -38.8% 93.0% 97.1% F14 2.6e+7 ⇒ 5.8e+6 -61.7% 65.5% 77.3% F15 2.3e+6 ⇒ 6.2e+5 -87.7% -48.7% 73.2% Ratio = 100 · ErrorMOS − ErrorSHADEILS max(ErrorMOS, ErrorSHADEILS)

Comparing with CEC'2013 criterion For each function Algorithms are sorted
by its tness/error. Algorithms are ranking by that order. Each algorithm receive points in based on its ranking (more to best ones). Global result Sum of score functions are accumulated.

Comparison with CEC'2013 criterion Algorithm Values Accuracy: 1.200e+05 Non-separable Functions
Overlapping Functions Functions with no separable subcomponents Functions with a separable subcomponent Unimodal IHDELS_2015 MOS SHADE-ILS VMODE 0 100 200 300 400 Highcharts.com

Conclusions We have proposed a new algorithm for LSGO: SHADEILS.
Apply iteratively DE+LS SHADE as the DE algorithm. It select iterately the LS with best last ratio improvement. Results SHADEILS is more competitive specially in more complex functions. SHADEILS win the previous state-of-art MOS.

Questions? Thanks you for your attention!!

SHADE with Iterative Local Search for Large-Sca...

SHADE with Iterative Local Search for Large-Scale Global Optimization

More Decks by dmolina

Other Decks in Science

Featured

Transcript