Upgrade to Pro — share decks privately, control downloads, hide ads and more …

cpp11 - welding R and C++

Jim Hester
November 14, 2020

cpp11 - welding R and C++

R and S have a long history of interacting with compiled languages. The Rcpp package has been a widely successful project, however over the years a number of issues and additional C++ features have arisen. Modifying Rcpp to fix or include these would require a great deal of work, or in some cases would be impossible without severely breaking backwards compatibility. cpp11 is a ground up rewrite of C++ bindings to R with different design trade-offs and features. This talk will focus on the new features of cpp11 including:

- Enforcing copy-on-write semantics.
- Improving the safety of using the R API from C++ code.
- Supporting ALTREP objects.
- Using UTF-8 strings everywhere.
- Applying newer C++11 features.

Jim Hester

November 14, 2020
Tweet

More Decks by Jim Hester

Other Decks in Programming

Transcript

  1. C++ very fast statically typed lots of libraries limited safety

    net Photo by Filipa Saldanha on Unsplash
  2. C++11 move semantics auto pointers type traits initializer lists variadic

    templates user defined literals user defined attributes Photo by Markus Spiske on Unsplash
  3. # Files Lines of code Rcpp (1.0.4) 379 74,658 cpp11

    (1.0.0) 19 1,734 Simpler implementation
  4. Rcpp features not in cpp11 ❌ No modules ❌ No

    sugar ❌ Less attributes ❌ No automatic random number restoration ❌ No roxygen2 comment documentation ❌ No interfaces
  5. package Rcpp compile time cpp11 compile time Rcpp peak memory

    cpp11 peak memory haven 17.42s 7.13s 428MB 204MB readr 124.13s 81.08s 969MB 684MB roxygen2 17.34s 4.24s 371MB 109MB tidyr 14.25s 3.34s 363MB 83MB Cheaper compilation
  6. header only Install pkgA with Rcpp x.y.z Later upgrade Rcpp

    pkgA segfaults tidyverse/dplyr#2335 Photo by Yogendra Singh on Unsplash
  7. vendor-able bring cpp11 to your ensure stability (miss out on

    updates) cpp_vendor() Photo by Humphrey Muleba on Unsplash
  8. –R Language Definition (Version 4.0.3) “The semantics of invoking a

    function in R argument are call-by-value. (sic). Changing the value of a supplied argument within a function will not affect the value of the variable in the calling frame.”
  9. sum_vec <- function(x) { sum <- 0 for (val in

    x) { sum <- sum + val } sum }
  10. Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] int

    sum_vec_rcpp(IntegerVector x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; }')
  11. Call-by-value x <- 1:3 sum_vec(x) #> [1] 6 x #>

    [1] 1 2 3 sum_vec_rcpp(x) #> [1] 6 x #> [1] 1 2 3
  12. Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] IntegerVector

    add_first_rcpp(IntegerVector x) { x[0] = x[0] + 1; return x; }') Call-by-value?
  13. x <- 1:3 add_first(x) #> [1] 2 2 3 x

    #> [1] 1 2 3 add_first_rcpp(x) #> [1] 2 2 3 x #> [1] 2 2 3 Call-by-reference
  14. x <- 1:3 add_first(x) #> [1] 2 2 3 x

    #> [1] 1 2 3 add_first_rcpp(x) #> [1] 2 2 3 x #> [1] 2 2 3 Call-by-reference
  15. Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] IntegerVector

    add_first_rcpp(IntegerVector x) { x[0] = x[0] + 1; return x; }') Call-by-reference
  16. Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] IntegerVector

    add_first_rcpp(IntegerVector x) { x[0] = x[0] + 1; return x; }') Call-by-reference
  17. cpp11::cpp_function(code = ' int sum_vec_cpp11(integers x) { int sum =

    0; for (auto val : x) { sum = sum + val; } return sum; } [[cpp11::register]] integers add_one_cpp11(writable::integers x) { x[0] = x[0] + 1; return x; } ')
  18. cpp11::cpp_function(code = ' int sum_vec_cpp11(integers x) { int sum =

    0; for (auto val : x) { sum = sum + val; } return sum; } [[cpp11::register]] integers add_one_cpp11(writable::integers x) { x[0] = x[0] + 1; return x; } ')
  19. cpp11::cpp_function(code = ' int sum_vec_cpp11(integers x) { int sum =

    0; for (auto val : x) { sum = sum + val; } return sum; } [[cpp11::register]] integers add_one_cpp11(writable::integers x) { x[0] = x[0] + 1; return x; } ')
  20. cpp11::cpp_function(code = ' integers add_one_cpp11(integers x) { x[0] = x[0]

    + 1; return x; }', quiet = FALSE) // error: expression is not assignable // x[0] = x[0] + 1; // ~~~~ ^
  21. Call-by-value x <- 1:3 sum_vec_cpp11(x) #> [1] 6 x #>

    [1] 1 2 3 add_one_cpp11(x) #> [1] 2 2 3 x #> [1] 1 2 3
  22. Call-by-value x <- 1:3 sum_vec_cpp11(x) #> [1] 6 x #>

    [1] 1 2 3 add_one_cpp11(x) #> [1] 2 2 3 x #> [1] 1 2 3
  23. SEXP fun() { std::string str("foo"); SEXP x = Rf_allocVector(STRSXP, 1);

    SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); return x } Safety Memory leak
  24. SEXP fun() { std::string str("foo"); SEXP x = Rf_allocVector(STRSXP, 1);

    SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); return x } Safety Memory leak
  25. SEXP fun() { std::string str("foo"); SEXP x = Rf_allocVector(STRSXP, 1);

    SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); return x } Safety Memory leak
  26. SEXP fun() { std::string str("foo"); cpp11::unwind_protect([&] { SEXP x =

    Rf_allocVector(STRSXP, 1); SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); }); return x } No memory leak
  27. SEXP fun() { std::string str("foo"); cpp11::unwind_protect([&] { SEXP x =

    Rf_allocVector(STRSXP, 1); SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); }); return x } No memory leak
  28. SEXP fun() { std::string str("foo"); SEXP x = cpp11::safe[Rf_allocVector](STRSXP, 1);

    SET_STRING_ELT(x, 0, cpp11::safe[Rf_mkChar](str.c_str())); return x } No memory leak
  29. SEXP fun() { std::string str("foo"); SEXP x = cpp11::safe[Rf_allocVector](STRSXP, 1);

    SET_STRING_ELT(x, 0, cpp11::safe[Rf_mkChar](str.c_str())); return x } No memory leak
  30. SEXP fun() { std::string str("foo"); SEXP x = cpp11::safe[Rf_allocVector](STRSXP, 1);

    SET_STRING_ELT(x, 0, cpp11::safe[Rf_mkChar](str.c_str())); return x } No memory leak
  31. - utf8everywhere.org, Pavel Radzivilovsky, Yakov Galka and Slava Novgorodov “(UTF-8)

    should be the default choice of encoding for storing text strings in memory or on disk, for communication and all other uses. (this) improves performance, reduces complexity of software and helps prevent many Unicode-related bugs.”
  32. SEXP f(SEXP x) { std::string str; str.reserve(Rf_xlength(STRING_ELT(x, 0))); void* vmax

    = vmaxget(); str.assign(Rf_translateCharUTF8(STRING_ELT(x, 0))); vmaxset(vmax); api.do_stuff(str); } To C++
  33. From C++ SEXP f(SEXP x) { std::string data = api.get_data();

    return Rf_ScalarString(Rf_mkCharLenCE(data.c_str(), data.size(), CE_UTF8)); }
  34. ALTREP Alternative representation R 3.5+ Where's the data? Used in

    base R Used in Photo by Danilo Santos on Unsplash
  35. ALTREP aware sum_vec <- function(x) { sum <- 0 for

    (val in x) { sum <- sum + val } sum } x <- 1:10000 sum_vec(x) #> [1] 50005000 .Internal(inspect(x)) #> @105a00cf8 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  36. ALTREP aware sum_vec <- function(x) { sum <- 0 for

    (val in x) { sum <- sum + val } sum } x <- 1:10000 sum_vec(x) #> [1] 50005000 .Internal(inspect(x)) #> @105a00cf8 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  37. ALTREP aware sum_vec <- function(x) { sum <- 0 for

    (val in x) { sum <- sum + val } sum } x <- 1:10000 sum_vec(x) #> [1] 50005000 .Internal(inspect(x)) #> @105a00cf8 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  38. Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int

    sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  39. Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int

    sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  40. Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int

    sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  41. Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int

    sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  42. Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum =

    0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  43. Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum =

    0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  44. Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum =

    0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  45. Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum =

    0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  46. ALTREP aware SEXP sum_vec_c2(SEXP x) { int sum = 0;

    for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + INTEGER_ELT(x, i); } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c2(x) #> [1] 50005000 .Internal(inspect(x)) #> @103e50f80 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  47. ALTREP aware SEXP sum_vec_c2(SEXP x) { int sum = 0;

    for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + INTEGER_ELT(x, i); } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c2(x) #> [1] 50005000 .Internal(inspect(x)) #> @103e50f80 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  48. ALTREP aware SEXP sum_vec_c2(SEXP x) { int sum = 0;

    for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + INTEGER_ELT(x, i); } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c2(x) #> [1] 50005000 .Internal(inspect(x)) #> @103e50f80 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  49. ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0;

    int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  50. ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0;

    int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  51. ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0;

    int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  52. ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0;

    int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  53. ALTREP aware int sum_vec_cpp11(integers x) { int sum = 0;

    for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_cpp11(x) #> [1] 50005000 .Internal(inspect(x)) #> @134b695f0 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  54. ALTREP aware int sum_vec_cpp11(integers x) { int sum = 0;

    for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_cpp11(x) #> [1] 50005000 .Internal(inspect(x)) #> @134b695f0 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  55. ✅ Semantics Safety Unicode ALTREP cpp11.r-lib.org C++11 simple(r) cheaper |

    compilation | appending | protection header only vendor-able  @jimhester_