The idea is quite old: Todd Veldhuizen, Expression templates, C++ Report, 1995 First (?) implementation: Blitz++ is a C++ class library for scientific computing which provides performance on par with Fortran 77/90. Today: std::valarray, Boost.uBLAS, MTL, Eigen, Armadillo, etc. How does it work?
a = x + y + z; lazy_v1.cpp:38:15: error: invalid operands to binary expression (’const vsum’ and ’vector’) a = x + y + z; ~~~~~ ^ ~ lazy_v1.cpp:12:12: note: candidate function not viable: no known conversion from ’const vsum’ to ’const vector’ for 1st argument const vsum operator+(const vector &a, const vector &b) { ^ 1 error generated. Sidenote: Clang error messages are awesome!
1 a = x + y − z; ... is of type: binary op< binary op< vector, plus, vector > , minus , vector > − z + x y for(size t i = 0; i < n; ++i) a[ i ] = [ i ]
1 a = x + y − z; ... is of type: binary op< binary op< vector, plus, vector > , minus , vector > − z + x y for(size t i = 0; i < n; ++i) a[ i ] = [ i ] [ i ]
1 a = x + y − z; ... is of type: binary op< binary op< vector, plus, vector > , minus , vector > − z + x y for(size t i = 0; i < n; ++i) a[ i ] = [ i ] [ i ] [ i ]
1 a = x + y − z; ... is of type: binary op< binary op< vector, plus, vector > , minus , vector > − z + x y #pragma omp parallel for for(size t i = 0; i < n; ++i) a[ i ] = [ i ] [ i ] [ i ]
v = a ∗ x + b ∗ y; 2 3 double c = (x + y)[42]; ... and its as effective as: 1 for(size t i = 0; i < n; ++i) 2 v[i ] = a[i] ∗ x[i ] + b[i] ∗ y[i ]; 3 4 double c = x[42] + y[42]; No temporaries involved. Optimizing compiler is able to inline everything.
at runtime from C99 source. 2 Kernel parameters are set through API calls. 3 Kernel is launched on a compute device. Source may be read from a file, or stored in a static string, or generated.
1 a = x + y − z; . . . to result in this kernel: 1 kernel void vexcl vector kernel( 2 ulong n, 3 global double ∗ res, 4 global double ∗ prm1, 5 global double ∗ prm2, 6 global double ∗ prm3 7 ) 8 { 9 for(size t idx = get global id(0); idx < n; idx += get global size(0)) { 10 res[idx] = ( ( prm1[idx] + prm2[idx] ) − prm3[idx] ); 11 } 12 } − z + x y
implementation is a bit more complicated: There are unary, binary, ternary expressions. There are special terminals requiring either global or local preambles. There are builtin and user-defined functions. . . . Boost.Proto is used to keep the code mess contained. But you got the general idea.
how to use it by specializing several struct templates: Does it need global preamble? What kernel parameters does it need? Does it need local preamble? What contribution does it make to expression string? How to set kernel parameters?