cpp11 - welding R and C++

6170c1d1970baf2a36a9ae2955e47ff3?s=47 Jim Hester
November 14, 2020

cpp11 - welding R and C++

R and S have a long history of interacting with compiled languages. The Rcpp package has been a widely successful project, however over the years a number of issues and additional C++ features have arisen. Modifying Rcpp to fix or include these would require a great deal of work, or in some cases would be impossible without severely breaking backwards compatibility. cpp11 is a ground up rewrite of C++ bindings to R with different design trade-offs and features. This talk will focus on the new features of cpp11 including:

- Enforcing copy-on-write semantics.
- Improving the safety of using the R API from C++ code.
- Supporting ALTREP objects.
- Using UTF-8 strings everywhere.
- Applying newer C++11 features.

6170c1d1970baf2a36a9ae2955e47ff3?s=128

Jim Hester

November 14, 2020
Tweet

Transcript

  1. cpp11 welding R and C++  @jimhester_ Photo by Christopher

    Burns on Unsplash
  2. C++ very fast statically typed lots of libraries limited safety

    net Photo by Filipa Saldanha on Unsplash
  3. None
  4. Then why cpp11? Photo by Russ Ward on Unsplash

  5. Photo by Micah Tindell on Unsplash

  6. C++11 move semantics auto pointers type traits initializer lists variadic

    templates user defined literals user defined attributes Photo by Markus Spiske on Unsplash
  7. # Files Lines of code Rcpp (1.0.4) 379 74,658 cpp11

    (1.0.0) 19 1,734 Simpler implementation
  8. Rcpp features not in cpp11 ❌ No modules ❌ No

    sugar ❌ Less attributes ❌ No automatic random number restoration ❌ No roxygen2 comment documentation ❌ No interfaces
  9. package Rcpp compile time cpp11 compile time Rcpp peak memory

    cpp11 peak memory haven 17.42s 7.13s 428MB 204MB readr 124.13s 81.08s 969MB 684MB roxygen2 17.34s 4.24s 371MB 109MB tidyr 14.25s 3.34s 363MB 83MB Cheaper compilation
  10. header only Install pkgA with Rcpp x.y.z Later upgrade Rcpp

    pkgA segfaults tidyverse/dplyr#2335 Photo by Yogendra Singh on Unsplash
  11. vendor-able bring cpp11 to your ensure stability (miss out on

    updates) cpp_vendor() Photo by Humphrey Muleba on Unsplash
  12. Protection cost

  13. Vector growth

  14. ✅ Semantics Safety Unicode ALTREP

  15. –R Language Definition (Version 4.0.3) “The semantics of invoking a

    function in R argument are call-by-value. (sic). Changing the value of a supplied argument within a function will not affect the value of the variable in the calling frame.”
  16. sum_vec <- function(x) { sum <- 0 for (val in

    x) { sum <- sum + val } sum }
  17. Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] int

    sum_vec_rcpp(IntegerVector x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; }')
  18. Call-by-value x <- 1:3 sum_vec(x) #> [1] 6 x #>

    [1] 1 2 3 sum_vec_rcpp(x) #> [1] 6 x #> [1] 1 2 3
  19. add_first <- function(x) { x[[1]] <- x[[1]] + 1 x

    } Call-by-value
  20. Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] IntegerVector

    add_first_rcpp(IntegerVector x) { x[0] = x[0] + 1; return x; }') Call-by-value?
  21. x <- 1:3 add_first(x) #> [1] 2 2 3 x

    #> [1] 1 2 3 add_first_rcpp(x) #> [1] 2 2 3 x #> [1] 2 2 3 Call-by-reference
  22. x <- 1:3 add_first(x) #> [1] 2 2 3 x

    #> [1] 1 2 3 add_first_rcpp(x) #> [1] 2 2 3 x #> [1] 2 2 3 Call-by-reference
  23. Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] IntegerVector

    add_first_rcpp(IntegerVector x) { x[0] = x[0] + 1; return x; }') Call-by-reference
  24. Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] IntegerVector

    add_first_rcpp(IntegerVector x) { x[0] = x[0] + 1; return x; }') Call-by-reference
  25. Pinnacle of success Photo by lovely shots on Unsplash

  26. Pit of success Photo by Eric Muhr on Unsplash

  27. cpp11::cpp_function(code = ' int sum_vec_cpp11(integers x) { int sum =

    0; for (auto val : x) { sum = sum + val; } return sum; } [[cpp11::register]] integers add_one_cpp11(writable::integers x) { x[0] = x[0] + 1; return x; } ')
  28. cpp11::cpp_function(code = ' int sum_vec_cpp11(integers x) { int sum =

    0; for (auto val : x) { sum = sum + val; } return sum; } [[cpp11::register]] integers add_one_cpp11(writable::integers x) { x[0] = x[0] + 1; return x; } ')
  29. cpp11::cpp_function(code = ' int sum_vec_cpp11(integers x) { int sum =

    0; for (auto val : x) { sum = sum + val; } return sum; } [[cpp11::register]] integers add_one_cpp11(writable::integers x) { x[0] = x[0] + 1; return x; } ')
  30. cpp11::cpp_function(code = ' integers add_one_cpp11(integers x) { x[0] = x[0]

    + 1; return x; }', quiet = FALSE) // error: expression is not assignable // x[0] = x[0] + 1; // ~~~~ ^
  31. Call-by-value x <- 1:3 sum_vec_cpp11(x) #> [1] 6 x #>

    [1] 1 2 3 add_one_cpp11(x) #> [1] 2 2 3 x #> [1] 1 2 3
  32. Call-by-value x <- 1:3 sum_vec_cpp11(x) #> [1] 6 x #>

    [1] 1 2 3 add_one_cpp11(x) #> [1] 2 2 3 x #> [1] 1 2 3
  33. Safety Photo by Rob Lambert on Unsplash

  34. void fun() { std::string str("foo"); Rf_error("something went wrong"); } Memory

    leak?
  35. void fun() { char* str = (char*) malloc(1024); Rf_error("something went

    wrong"); free(str); } Memory leak
  36. void fun() { std::string str("foo"); throw std::runtime_error("something went wrong"); }

    No memory leak
  37. Rcpp::sourceCpp(code = '#include "Rcpp.h" std::string fun() { std::string str("foo"); return

    str; }') Memory leak
  38. SEXP fun() { std::string str("foo"); SEXP x = Rf_allocVector(STRSXP, 1);

    SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); return x } Safety Memory leak
  39. SEXP fun() { std::string str("foo"); SEXP x = Rf_allocVector(STRSXP, 1);

    SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); return x } Safety Memory leak
  40. SEXP fun() { std::string str("foo"); SEXP x = Rf_allocVector(STRSXP, 1);

    SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); return x } Safety Memory leak
  41. –Jim Hester “All calls to R API functions can potentially

    error.”
  42. Pit of success Photo by Kelly Sikkema on Unsplash

  43. SEXP fun() { std::string str("foo"); cpp11::unwind_protect([&] { SEXP x =

    Rf_allocVector(STRSXP, 1); SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); }); return x } No memory leak
  44. SEXP fun() { std::string str("foo"); cpp11::unwind_protect([&] { SEXP x =

    Rf_allocVector(STRSXP, 1); SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); }); return x } No memory leak
  45. SEXP fun() { std::string str("foo"); SEXP x = cpp11::safe[Rf_allocVector](STRSXP, 1);

    SET_STRING_ELT(x, 0, cpp11::safe[Rf_mkChar](str.c_str())); return x } No memory leak
  46. SEXP fun() { std::string str("foo"); SEXP x = cpp11::safe[Rf_allocVector](STRSXP, 1);

    SET_STRING_ELT(x, 0, cpp11::safe[Rf_mkChar](str.c_str())); return x } No memory leak
  47. SEXP fun() { std::string str("foo"); SEXP x = cpp11::safe[Rf_allocVector](STRSXP, 1);

    SET_STRING_ELT(x, 0, cpp11::safe[Rf_mkChar](str.c_str())); return x } No memory leak
  48. Unicode Photo by Henry & Co. on Unsplash

  49. Getting unicode right is HARD!

  50. Getting unicode right is HARD!

  51. Getting unicode right is HARD!

  52. Getting unicode right is HARD!

  53. Rcpp::sourceCpp(code = ' #include "Rcpp.h" // [[Rcpp::export]] std::string do_nothing_rcpp(std::string x)

    { return x; }')
  54. x <- "fa\xE7ile" Encoding(x) <- "latin1" do_nothing_rcpp(x) #> [1] "façile"

  55. x <- "fa\U00E7ile" do_nothing_rcpp(x) #> [1] "façile"

  56. Pit of success Photo by Tim Gouw on Unsplash

  57. - utf8everywhere.org, Pavel Radzivilovsky, Yakov Galka and Slava Novgorodov “(UTF-8)

    should be the default choice of encoding for storing text strings in memory or on disk, for communication and all other uses. (this) improves performance, reduces complexity of software and helps prevent many Unicode-related bugs.”
  58. – https://cpp11.r-lib.org/articles/motivations.html “text coming from R is always translated into

    UTF-8. text coming from C++ is always marked as UTF-8."
  59. cpp11::cpp_source(code = ' #include <cpp11.hpp> [[cpp11::register]] std::string do_nothing_cpp(std::string x) {

    return x; }')
  60. SEXP f(SEXP x) { std::string str; str.reserve(Rf_xlength(STRING_ELT(x, 0))); void* vmax

    = vmaxget(); str.assign(Rf_translateCharUTF8(STRING_ELT(x, 0))); vmaxset(vmax); api.do_stuff(str); } To C++
  61. From C++ SEXP f(SEXP x) { std::string data = api.get_data();

    return Rf_ScalarString(Rf_mkCharLenCE(data.c_str(), data.size(), CE_UTF8)); }
  62. x <- "fa\xE7ile" Encoding(x) <- "latin1" do_nothing_cpp(x) #> [1] "façile"

  63. x <- "fa\U00E7ile" do_nothing_cpp(x) #> [1] "façile"

  64. ALTREP Alternative representation R 3.5+ Where's the data? Used in

    base R Used in Photo by Danilo Santos on Unsplash
  65. None
  66. ALTREP aware sum_vec <- function(x) { sum <- 0 for

    (val in x) { sum <- sum + val } sum } x <- 1:10000 sum_vec(x) #> [1] 50005000 .Internal(inspect(x)) #> @105a00cf8 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  67. ALTREP aware sum_vec <- function(x) { sum <- 0 for

    (val in x) { sum <- sum + val } sum } x <- 1:10000 sum_vec(x) #> [1] 50005000 .Internal(inspect(x)) #> @105a00cf8 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  68. ALTREP aware sum_vec <- function(x) { sum <- 0 for

    (val in x) { sum <- sum + val } sum } x <- 1:10000 sum_vec(x) #> [1] 50005000 .Internal(inspect(x)) #> @105a00cf8 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  69. Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int

    sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  70. Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int

    sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  71. Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int

    sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  72. Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int

    sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  73. Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum =

    0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  74. Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum =

    0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  75. Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum =

    0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  76. Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum =

    0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B
  77. ALTREP aware SEXP sum_vec_c2(SEXP x) { int sum = 0;

    for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + INTEGER_ELT(x, i); } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c2(x) #> [1] 50005000 .Internal(inspect(x)) #> @103e50f80 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  78. ALTREP aware SEXP sum_vec_c2(SEXP x) { int sum = 0;

    for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + INTEGER_ELT(x, i); } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c2(x) #> [1] 50005000 .Internal(inspect(x)) #> @103e50f80 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  79. ALTREP aware SEXP sum_vec_c2(SEXP x) { int sum = 0;

    for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + INTEGER_ELT(x, i); } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c2(x) #> [1] 50005000 .Internal(inspect(x)) #> @103e50f80 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  80. ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0;

    int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  81. ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0;

    int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  82. ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0;

    int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  83. ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0;

    int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  84. function time sum_vec_c 374µs sum_vec_c2 625µs sum_vec_c3 186µs

  85. Pit of success Photo by Christian Bowen on Unsplash

  86. ALTREP aware int sum_vec_cpp11(integers x) { int sum = 0;

    for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_cpp11(x) #> [1] 50005000 .Internal(inspect(x)) #> @134b695f0 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  87. ALTREP aware int sum_vec_cpp11(integers x) { int sum = 0;

    for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_cpp11(x) #> [1] 50005000 .Internal(inspect(x)) #> @134b695f0 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B
  88. ✅ Semantics Safety Unicode ALTREP cpp11.r-lib.org C++11 simple(r) cheaper |

    compilation | appending | protection header only vendor-able  @jimhester_