Quick statement of the σ_odd problem (and its variant ς_odd problem) with an algorithm to check it. Benchmarks of parallel implementations in multi-threads, Open MPI and OpenCL.
Universit´
e Libre de Bruxelles
Computer Science Department
INFO-Y100 (4004940ENR) Parallel systems
Project
Parallel
Numerical Verification of
the σodd
problem
Presentation
1
3
7
21
Olivier Pirson — [email protected]
orcid.org/0000-0001-6296-9659
December 15, 2017
(Last modifications: September 11, 2019)
https://speakerdeck.com/opimedia/parallel-numerical-verification-of-the-s-odd-problem
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
3 Parallel implementations
Multi-threads
Message-passing (Open MPI)
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables
Parallel Numerical Verification of the σodd problem 2 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
The σodd
and ςodd
functions
σ(n) = sum of all divisors of n (sigma)
σodd
(n) = sum of odd divisors of n (sigma odd)
All divisors of 18: {1, 2, 3, 6, 9, 18}
Only odd divisors: {1, 3, 9} so σodd
(18) = 13
All divisors of 19: {1, 19}
Only odd divisors: {1, 19} so σodd
(19) = 20
ςodd
(n) = σodd
(n) divided by 2 until to be odd (varsigma odd)
ςodd
(18) = 13
ςodd
(19) = 5
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2
σ(n) 1 3 4 7 6 12 8 15 13 18 12 28 14 24 24 31 18 39 20 42 3
σodd
(n) 1 1 4 1 6 4 8 1 13 6 12 4 14 8 24 1 18 13 20 6 3
ςodd
(n) 1 1 1 1 3 1 1 1 13 3 3 1 7 1 3 1 9 13 5 3
Parallel Numerical Verification of the σodd problem 3 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
The σodd
problem: an iteration problem
We iterate the ςodd
(or equivalently σodd
) function
and we observe that we always reach 1.
Numbers in orange are square numbers.
For all n odd and square number (= 1):
ςodd
(n) = σodd
(n) > n
But we observe that for almost other odd numbers n:
ςodd
(n) < n
Note that even numbers are not interesting
for this problem, because
σodd
(2n) = σodd
(n).
and ςodd
(2n) = ςodd
(n).
1
3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81 121 133
83
85
Parallel Numerical Verification of the σodd problem 4 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
The σodd
problem: an iteration problem
The point in the middle of this picture is the number 1.
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Parallel Numerical Verification of the σodd problem 5 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
The σodd
problem is a conjecture
Does the iteration always reaches 1?
The σodd
problem is the conjecture that is always true,
what ever the starting number (integer ≥ 1).
Successfully checked for each n until 1.1 × 1011 ≃ 1.6 × 236
with programs developed for this work.
Previous result known was 230.
Moreover, n ≤ 1011 =⇒ ςodd
15(n) = 1
Parallel Numerical Verification of the σodd problem 6 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
3 Parallel implementations
Multi-threads
Message-passing (Open MPI)
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables
Parallel Numerical Verification of the σodd problem 7 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Numerical verification by the simple direct algorithm
For each odd number:
Algorithm 1 first check varsigma odd(first n, last n)
Ò f i r s t c h e c k v a r s i g m a o d d ( f i r s t n , l a s t n ) :
1 ÓÖ n = f i r s t n ØÓ l a s t n ר Ô 2
2 lowe r n , l e n g t h = f i r s t i t e r a t e v a r s i g m a o d d u n t i l l o w e r (n )
3 l e n g t h > 1 Ø Ò
4 ÔÖ ÒØ n , lowe r n , l e n g t h
Parallel Numerical Verification of the σodd problem 8 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Numerical verification by the simple direct algorithm
Simply iterate ςodd
until to have a little number:
Algorithm 2 first iterate varsigma odd until lower(n)
Ò f i r s t i t e r a t e v a r s i g m a o d d u n t i l l o w e r ( s t a r t n ) :
1 n = s t a r t n
2 l e n g t h = 0
3 Ó
4 l e n g t h = l e n g t h + 1
5 n = ςodd (n )
6 n > MAX POSSIBLE N Ø Ò
7 ÔÖ ÒØ "! Impossible to check " , s t a r t n , le ngth , n
8 Ü Ø
9 Û Ð n > s t a r t n
10
11 n = s t a r t n Ø Ò
12 ÔÖ ÒØ "! Found not trivial cycle " , s t a r t n , l e n g t h
13 Ü Ø
14
15 Ö ØÙÖÒ n , l e n g t h
Parallel Numerical Verification of the σodd problem 9 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Computation of σodd
(n)
Assume n odd:
n = pα1
1
× pα2
2
× pα3
3
× · · · × pαk
k
with pi
distinct prime numbers
σodd
(n) = pα+1
1
−1
p1−1
× pα+1
2
−1
p2−1
× pα+1
3
−1
p3−1
× · · · × pα+1
k
−1
pk
−1
Thus, to verify the conjecture we must factorize
(other ways are less efficient).
Parallel Numerical Verification of the σodd problem 10 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Use properties to avoid a lot of computations
For each n, we want to check there exists k such that σodd
k (n) = 1
It is equivalent to check there exists k such that ςodd
k (n) < n.
That reduces the path that will be compute.
Only odd numbers must be check (50%).
Other numbers can be avoided (remains ≃ 33%).
Almost numbers reach smaller number in only one step!
Exceptions identified before computation: square numbers.
The other exceptions (called bad numbers) are very rare.
So instead to iterate we will compute only one step
and keep exceptions that will be check separately (very fast).
ςodd
(ab) ≤ ςodd
(a) ςodd
(b)
−→ shortcut in the factorization (the most heavy work)
(with use of previous known bad numbers
or with general upper bound).
Parallel Numerical Verification of the σodd problem 11 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Transformed problem
With these properties we have transformed the necessity to compute the
complete iteration of σodd
(and thus the complete factorization)
of each number
to this both improved and simpler (relatively to other possible
optimizations) algorithm:
compute only one
(eventually partially) iteration of ςodd
for only some numbers.
“The cheapest, fastest and most reliable components of a computer system
are those that aren’t there.”
— Gordon Bell
Parallel Numerical Verification of the σodd problem 12 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Transformed problem
(progs/src/sequential/sequential/sequential.hpp)
Algorithm 3 sequential check gentle varsigma odd(first n,
last n)
// P r e c o n d i t i o n s : 3 ≤ f i r s t n odd ≤ l a s t n ≤ MAX POSSIBLE N
Ò s e q u e n t i a l c h e c k g e n t l e v a r s i g m a o d d ( f i r s t n , l a s t n ) :
1 b a d t a b l e = ∅
2 ÓÖ n = f i r s t n ØÓ l a s t n ר Ô 2
3 ÒÓØ (3, 7, 31 or 127 \
\ n) Ø Ò
4 ÒÓØ (n i s square number) Ø Ò
5 ÒÓØ s e q u e n t i a l i s v a r s i g m a o d d l o w e r (n ,
6 bad table , f i r s t n ) Ø Ò
7 b a d t a b l e = b a d t a b l e ∪ {n}
8 ÔÖ ÒØ n
Ö ØÙÖÒ b a d t a b l e
// P o s t c o n d i t i o n :
// I f a l l numbers < f i r s t n r e s p e c t the c o n j e c t u r e
// and a l l square numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e
// and a l l odd bad numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e
// then a l l numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e .
// P r i n t a l l odd bad numbers between f i r s t n and l a s t n ( i n c l u d e d )
// and r e t u r n the s e t .
d \ n means that d is a divisor of n.
d \
\ n means that d is a divisor of n, but d2 is not.
Parallel Numerical Verification of the σodd problem 13 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Transformed problem
Computes (eventually partially) ςodd
(n) by the factorization of n
and returns True if and only if ςodd
(n) < n.
Algorithm 4 sequential is varsigma odd lower(n, bad table,
bad first n)
// P r e c o n d i t i o n s : 3 ≤ n odd ≤ MAX POSSIBLE N
// b a d t a b l e c o n t a i n s a l l odd bad numbers
// between b a d f i r s t n ( i n c l u d e d ) and n ( e xc lude d )
Ò s e q u e n t i a l i s v a r s i g m a o d d l o w e r (n , bad table , b a d f i r s t n ) :
1 n d i v i d e d = n
2 varsigma odd = 1
3 ÓÖ p odd prime ≤ ⌊
√
n divided⌋
4 α = 0
5 Û Ð p \ n d i v i d e d
6 n d i v i d e d = n d i v i d e d / p
7 α = α + 1
8
9 α > 0 Ø Ò // pα i s a f a c t o r of n
10 varsigma odd = varsigma odd ∗ Odd pα
− 1
p − 1
+ pα
11 ( varsigma odd
12 ∗ s e q u e n t i a l s i g m a o d d u p p e r b o u n d ( n d i v i d e d ,
13 bad table , b a d f i r s t n )) < n Ø Ò
14 Ö ØÙÖÒ ÌÖÙ
15
16 n d i v i d e d > 1 Ø Ò // n d i v i d e d i s prime
17 varsigma odd = varsigma odd ∗ Odd( n d i v i d e d + 1)
18
19 Ö ØÙÖÒ ( varsigma odd < n )
Parallel Numerical Verification of the σodd problem 14 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Factorization shortcut
When we found a prime factor,
it may be possible to shortcut the complete factorization.
For example, with a first prime factor p1
of n:
n = pα1
1
n′
σodd
(n) = pα+1
1
−1
p1−1
× σodd
(n′)
σodd
(n) ≤ pα+1
1
−1
p1−1
× upper bound of σodd
(n′) < n? If yes, then stop
Upper bound always true:
σodd
(n′) ≤ 2n′ 8
√
n′
It is the same for the ςodd
function, with some additional division(s) by 2.
And if n′ is gentle (odd but neither square neither bad):
ςodd
(n′) < n′ (so it can be possible to shortcut “often”).
Parallel Numerical Verification of the σodd problem 15 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Stop and restart
Note that the program can be stopped (or executed until some last n)
and restarted with the last value checked.
In fact, it is possible to compute different ranges of numbers separately (in
the same time or not).
If all required numbers are checked (with odd square numbers and bad
numbers checked, for example by the naive way, which is fast for these rare
numbers) until number N, then the conclusion is for all n such that n ≤ N,
the iteration of σodd
(and ςodd
) from n reaches 1 (what we wanted to
achieve).
Parallel Numerical Verification of the σodd problem 16 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
3 Parallel implementations
Multi-threads
Message-passing (Open MPI)
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables
Parallel Numerical Verification of the σodd problem 17 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Performance with one thread/process
First, the comparison between sequential, three multi-threading and two
message-passing implementations (for only one thread/process).
By checking numbers between 1 and 20,000,001.
On a personal computer with 4 cores, 2 threads by core.
6
6.2
6.4
6.6
6.8
7
0 1 2 3 4 5
seconds
0:sequential,
one thread (1:one by one, 2:by range, 3:dynamic),
one process MPI (4:one by one, 5:dynamic)
Parallel Numerical Verification of the σodd problem 18 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Multi-threads (thread of C++11)
3 different implementations: (progs/src/threads/threads/threads.hpp)
One by one
Each slave computes independently one number and sends a boolean to the
master. The master also computes one number, and waits everybody. And
so forth with next numbers.
Silly implementation; just to try. Very inefficient. The barrier is a big
limitation because each number has a different factorization time.
By range
Like one by one but each slave receives a range of numbers (by these
extremities), computes and returns the (very little) set of bad numbers
founds. The master computes a smaller range, and waits everybody. And so
forth with next numbers.
Really better because computation is more well balanced, due to an average
of the factorization time.
“Dynamic”
Like by range, but the master do not waits, gives new range when a slave is
free, and computes also the rest of the time.
Very good occupation for each thread (see graph in following slides).
All threads share the same prime number tables.
Parallel Numerical Verification of the σodd problem 19 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Multi-threads — one by one
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8
seconds
# threads
Parallel Numerical Verification of the σodd problem 20 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Multi-threads — by range
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8
seconds
# threads
Parallel Numerical Verification of the σodd problem 21 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Multi-threads — “dynamic”
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8
seconds
# threads
Parallel Numerical Verification of the σodd problem 22 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Message-passing (Open MPI)
2 implementations: (progs/src/mpi/mpi/mpi.hpp)
One by one
One element, barrier. Very inefficient; just to try.
“Dynamic”
By range and does not wait.
Same algorithms than for multi-threading.
But exchange information by messages. (That could be between different
machines, but these results was computed on only one computer.) Little
impact if size of range is important compared to the small quantity of these
information.
Messages from the master to each slave:
The unique number or the extremities of the range, and the new (rare) bad
numbers found by other threads.
Messages from each slave to the master:
A boolean or a array of the new (rare) bad numbers found.
Main differences with multi-threading: exchanges between processes,
and each process have its own prime numbers table.
Parallel Numerical Verification of the σodd problem 23 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Message-passing — “dynamic”
0
1
2
3
4
5
6
7
1 2 3 4 5 6
seconds
5 ¡o¢£¤¤
Parallel Numerical Verification of the σodd problem 24 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
GPU (OpenCL)
Only one implementation: (progs/src/opencl/opencl/opencl.hpp)
By list of numbers
The CPU selects a list of numbers to be check
and sends them to the GPU.
The GPU compute completely ς(n) for each n received (without to use
a list of bad numbers and without to shortcut the factorization).
Then the GPU returns a corresponding list of booleans to the CPU.
And so forth.
Instead a direct computation of ς(n) during the factorization,
this implementation collects before all prime factors of n.
That makes it easier the parallel work.
The important improvements of the algorithm (the shortcut of the
factorization) was also removed, because that did not gave better results,
due to the complexification of branching.
Parallel Numerical Verification of the σodd problem 25 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
GPU (OpenCL): explanations of bad results
The computation is massively parallel (if big list of numbers).
But the efficiency is limited by the difference of the factorization process
for each number. The algorithm, by the nature of the computation of the
problem by factorization, is more or less a random succession of conditional
branches. And the nature of the parallel computation by GPUs loses a lot
of power on that.
More the list of numbers is big and more the computation is ideally
parallel. But more this list is big and more the computation of each
number disturbs the progress of the others.
Moreover, all numbers quickly factorized wait the end of the others.
Also, GPUs give the best of their power on floating point computations.
This problem is an integer problem.
A completely different approach could be better.
Parallel Numerical Verification of the σodd problem 26 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
GPU (OpenCL): old GPU used during tests
The poor performances on the OpenCL implementation
are also due to the old GPU used:
a graphic card NVIDIA quadro FX 1800 with 768 Mio.
This GPU has no cache for the global memory.
And the main loop iterates on prime numbers in this global memory.
More modern GPU could use the native OpenCL function ctz (instead a
loop).
Nevertheless, with the maximum list of numbers possible for this GPU, the
OpenCL implementation has a little (disappointing) gain of performance
compared to the sequential implementation.
Parallel Numerical Verification of the σodd problem 27 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
GPU (OpenCL) — by list of numbers
0
20
40
60
80
100
100 1000 10000 100000
seconds
s¥¦§ ¨© § ¥s ¨© §s ¨ ¥¥! s! §"
Parallel Numerical Verification of the σodd problem 28 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
3 Parallel implementations
Multi-threads
Message-passing (Open MPI)
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables
Parallel Numerical Verification of the σodd problem 29 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Results
Results are produced on a computer with only 4 cores, that explains the
decrease in gains beginning at 5 cores.
Results with Open MPI are a little strange, because for some parameters
they are better than the sequential implementation. It is like as if mpirun
on the sequential program made it faster.
Theoretically the overhead of the MPI implementation should be bigger
than the multi-thread implementation, due to the communication between
processes (but tests were made on a single computer).
The implementation is almost identical to the multi-thread version and all
computation results are identical, thus it must be correct.
Maybe the GCC compiler required with Open MPI optimizes better this
code than the clang compiler used for sequential and multi-thread versions.
Maybe is due to a little imprecision in the measures.
The two better implementations (“dynamic” algorithm with threads and
Open MPI) are both pretty close to the ideal.
Parallel Numerical Verification of the σodd problem 30 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Speedup
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8
speedup
# thre#$%&'()e00
i$e12i23
0e46e12i#7
thre#$0%(1e 83 (1e
thre#$0%83 '#19e
thre#$0%$31#@i)
wAB%(1e 83 (1e
wAB%$31#@i)
Parallel Numerical Verification of the σodd problem 31 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Speedup with OpenCL
0
1
2
3
4
5
1 10 100 1000 10000 100000 1x106
CDEEFGD
# thrHIPQRSocess or size of the list of numbers (logarithmic scale)
identity
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
OpenCL
Parallel Numerical Verification of the σodd problem 32 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Efficiency
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 T 6 U 8
e
V
ciency
W XY`ead/process
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
Parallel Numerical Verification of the σodd problem 33 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Efficiency with OpenCL
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000 abac6
e
d
ciency
# thread/process or size of the list of numbers (logarithmic scale)
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
OpenCL
Parallel Numerical Verification of the σodd problem 34 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Overhead
0
1000
2000
3000
4000
5000
0 1 2 3 4 5 6 7 8
efgh
head
# thripqrstuvixx
xiyip
thripqxrui
ui
thripqxr
tpi
thripqxrqpv
rui
ui
rqpv
Parallel Numerical Verification of the σodd problem 35 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Overhead only until 4 cores
0
200
600
800
1000
1200
0 1 2
d
head
# thrfghjklmnfpp
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
Parallel Numerical Verification of the σodd problem 36 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Overhead with OpenCL
1
10
100
1000
10000
100000
1x106
1x107
1x108
1x10
q
1x1010
1 10 100 1000 10000 100000 1x106
overhead (logarithmic scale)
# thread/process or size of the list of numbers (logarithmic scale)
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
OpenCL
Parallel Numerical Verification of the σodd problem 37 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Benchmarks table: sequential, threads & message passing
Technology Algorithm # threads/process Time in s Speedup Efficiency Overhead
sequential 1 6.853 1.000 1.00000 0.000
threads one by one 1 6.873 0.997 0.99720 19.213
threads one by one 2 57.101 0.120 0.06001 107348.026
threads one by one 3 54.349 0.126 0.04203 156192.179
threads one by one 4 54.395 0.126 0.03150 210726.330
threads one by one 5 53.782 0.127 0.02549 262057.964
threads one by one 6 72.986 0.094 0.01565 431064.690
threads one by one 7 79.897 0.086 0.01225 552426.255
threads one by one 8 81.665 0.084 0.01049 646469.398
threads by range 1 6.858 0.999 0.99931 4.764
threads by range 2 3.980 1.722 0.86105 1105.961
threads by range 3 2.674 2.563 0.85420 1169.809
threads by range 4 1.935 3.542 0.88541 886.991
threads by range 5 2.224 3.081 0.61622 4268.383
threads by range 6 1.900 3.608 0.60132 4543.861
threads by range 7 1.641 4.176 0.59653 4635.448
threads by range 8 1.452 4.722 0.59019 4758.860
threads dynamic 1 6.862 0.999 0.99879 8.274
threads dynamic 2 3.652 1.876 0.93823 451.194
threads dynamic 3 2.432 2.818 0.93918 443.806
threads dynamic 4 1.820 3.765 0.94116 428.429
threads dynamic 5 1.676 4.090 0.81804 1524.452
threads dynamic 6 1.541 4.447 0.74122 2392.667
threads dynamic 7 1.427 4.804 0.68625 3133.355
threads dynamic 8 1.328 5.161 0.64514 3769.762
MPI one by one 1 6.385 1.073 1.07329 -467.966
MPI one by one 2 13.981 0.490 0.24509 21109.499
MPI one by one 3 14.496 0.473 0.15760 36633.994
MPI one by one 4 14.819 0.462 0.11562 52422.147
MPI one by one 5 17.613 0.389 0.07782 81212.792
MPI one by one 6 17.994 0.381 0.06348 101108.177
MPI dynamic 1 6.350 1.079 1.07924 -503.202
MPI dynamic 2 3.373 2.032 1.01581 -106.693
MPI dynamic 3 2.253 3.042 1.01410 -95.266
MPI dynamic 4 1.677 4.088 1.02196 -147.274
MPI dynamic 5 1.560 4.393 0.87862 946.749
MPI dynamic 6 1.440 4.760 0.79339 1784.713
Parallel Numerical Verification of the σodd problem 38 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
Benchmarks table: OpenCL
Size list Time in s Speedup Efficiency Overhead
64 119.918 0.057 0.00089 7667879.541
128 65.360 0.105 0.00082 8359216.723
256 36.547 0.188 0.00073 9349189.863
512 18.983 0.361 0.00071 9712571.781
1024 10.184 0.673 0.00066 10422012.896
2048 9.033 0.759 0.00037 18493143.642
4096 8.203 0.835 0.00020 33593316.851
8192 7.407 0.925 0.00011 60668647.350
16384 6.490 1.056 0.00006 106333100.097
32768 5.589 1.226 0.00004 183148027.592
65536 5.141 1.333 0.00002 336897704.640
131072 5.208 1.316 0.00001 682584188.174
262144 4.885 1.403 0.00001 1280678502.121
Parallel Numerical Verification of the σodd problem 39 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
3 Parallel implementations
Multi-threads
Message-passing (Open MPI)
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables
Parallel Numerical Verification of the σodd problem 40 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529
553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991
995
997
999
1001
The end
All results, documents, C++/OpenCL, L
A
TEX sources
and references are available on Bitbucket:
https://bitbucket.org/OPiMedia/parallel-sigma odd-problem
Parallel Numerical Verification of the σodd problem 41 / 41