Slide 1

Slide 1 text

cpp11 welding R and C++  @jimhester_ Photo by Christopher Burns on Unsplash

Slide 2

Slide 2 text

C++ very fast statically typed lots of libraries limited safety net Photo by Filipa Saldanha on Unsplash

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Then why cpp11? Photo by Russ Ward on Unsplash

Slide 5

Slide 5 text

Photo by Micah Tindell on Unsplash

Slide 6

Slide 6 text

C++11 move semantics auto pointers type traits initializer lists variadic templates user defined literals user defined attributes Photo by Markus Spiske on Unsplash

Slide 7

Slide 7 text

# Files Lines of code Rcpp (1.0.4) 379 74,658 cpp11 (1.0.0) 19 1,734 Simpler implementation

Slide 8

Slide 8 text

Rcpp features not in cpp11 ❌ No modules ❌ No sugar ❌ Less attributes ❌ No automatic random number restoration ❌ No roxygen2 comment documentation ❌ No interfaces

Slide 9

Slide 9 text

package Rcpp compile time cpp11 compile time Rcpp peak memory cpp11 peak memory haven 17.42s 7.13s 428MB 204MB readr 124.13s 81.08s 969MB 684MB roxygen2 17.34s 4.24s 371MB 109MB tidyr 14.25s 3.34s 363MB 83MB Cheaper compilation

Slide 10

Slide 10 text

header only Install pkgA with Rcpp x.y.z Later upgrade Rcpp pkgA segfaults tidyverse/dplyr#2335 Photo by Yogendra Singh on Unsplash

Slide 11

Slide 11 text

vendor-able bring cpp11 to your ensure stability (miss out on updates) cpp_vendor() Photo by Humphrey Muleba on Unsplash

Slide 12

Slide 12 text

Protection cost

Slide 13

Slide 13 text

Vector growth

Slide 14

Slide 14 text

✅ Semantics Safety Unicode ALTREP

Slide 15

Slide 15 text

–R Language Definition (Version 4.0.3) “The semantics of invoking a function in R argument are call-by-value. (sic). Changing the value of a supplied argument within a function will not affect the value of the variable in the calling frame.”

Slide 16

Slide 16 text

sum_vec <- function(x) { sum <- 0 for (val in x) { sum <- sum + val } sum }

Slide 17

Slide 17 text

Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; }')

Slide 18

Slide 18 text

Call-by-value x <- 1:3 sum_vec(x) #> [1] 6 x #> [1] 1 2 3 sum_vec_rcpp(x) #> [1] 6 x #> [1] 1 2 3

Slide 19

Slide 19 text

add_first <- function(x) { x[[1]] <- x[[1]] + 1 x } Call-by-value

Slide 20

Slide 20 text

Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] IntegerVector add_first_rcpp(IntegerVector x) { x[0] = x[0] + 1; return x; }') Call-by-value?

Slide 21

Slide 21 text

x <- 1:3 add_first(x) #> [1] 2 2 3 x #> [1] 1 2 3 add_first_rcpp(x) #> [1] 2 2 3 x #> [1] 2 2 3 Call-by-reference

Slide 22

Slide 22 text

x <- 1:3 add_first(x) #> [1] 2 2 3 x #> [1] 1 2 3 add_first_rcpp(x) #> [1] 2 2 3 x #> [1] 2 2 3 Call-by-reference

Slide 23

Slide 23 text

Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] IntegerVector add_first_rcpp(IntegerVector x) { x[0] = x[0] + 1; return x; }') Call-by-reference

Slide 24

Slide 24 text

Rcpp::sourceCpp(code = '#include "Rcpp.h" using namespace Rcpp; // [[Rcpp::export]] IntegerVector add_first_rcpp(IntegerVector x) { x[0] = x[0] + 1; return x; }') Call-by-reference

Slide 25

Slide 25 text

Pinnacle of success Photo by lovely shots on Unsplash

Slide 26

Slide 26 text

Pit of success Photo by Eric Muhr on Unsplash

Slide 27

Slide 27 text

cpp11::cpp_function(code = ' int sum_vec_cpp11(integers x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; } [[cpp11::register]] integers add_one_cpp11(writable::integers x) { x[0] = x[0] + 1; return x; } ')

Slide 28

Slide 28 text

cpp11::cpp_function(code = ' int sum_vec_cpp11(integers x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; } [[cpp11::register]] integers add_one_cpp11(writable::integers x) { x[0] = x[0] + 1; return x; } ')

Slide 29

Slide 29 text

cpp11::cpp_function(code = ' int sum_vec_cpp11(integers x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; } [[cpp11::register]] integers add_one_cpp11(writable::integers x) { x[0] = x[0] + 1; return x; } ')

Slide 30

Slide 30 text

cpp11::cpp_function(code = ' integers add_one_cpp11(integers x) { x[0] = x[0] + 1; return x; }', quiet = FALSE) // error: expression is not assignable // x[0] = x[0] + 1; // ~~~~ ^

Slide 31

Slide 31 text

Call-by-value x <- 1:3 sum_vec_cpp11(x) #> [1] 6 x #> [1] 1 2 3 add_one_cpp11(x) #> [1] 2 2 3 x #> [1] 1 2 3

Slide 32

Slide 32 text

Call-by-value x <- 1:3 sum_vec_cpp11(x) #> [1] 6 x #> [1] 1 2 3 add_one_cpp11(x) #> [1] 2 2 3 x #> [1] 1 2 3

Slide 33

Slide 33 text

Safety Photo by Rob Lambert on Unsplash

Slide 34

Slide 34 text

void fun() { std::string str("foo"); Rf_error("something went wrong"); } Memory leak?

Slide 35

Slide 35 text

void fun() { char* str = (char*) malloc(1024); Rf_error("something went wrong"); free(str); } Memory leak

Slide 36

Slide 36 text

void fun() { std::string str("foo"); throw std::runtime_error("something went wrong"); } No memory leak

Slide 37

Slide 37 text

Rcpp::sourceCpp(code = '#include "Rcpp.h" std::string fun() { std::string str("foo"); return str; }') Memory leak

Slide 38

Slide 38 text

SEXP fun() { std::string str("foo"); SEXP x = Rf_allocVector(STRSXP, 1); SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); return x } Safety Memory leak

Slide 39

Slide 39 text

SEXP fun() { std::string str("foo"); SEXP x = Rf_allocVector(STRSXP, 1); SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); return x } Safety Memory leak

Slide 40

Slide 40 text

SEXP fun() { std::string str("foo"); SEXP x = Rf_allocVector(STRSXP, 1); SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); return x } Safety Memory leak

Slide 41

Slide 41 text

–Jim Hester “All calls to R API functions can potentially error.”

Slide 42

Slide 42 text

Pit of success Photo by Kelly Sikkema on Unsplash

Slide 43

Slide 43 text

SEXP fun() { std::string str("foo"); cpp11::unwind_protect([&] { SEXP x = Rf_allocVector(STRSXP, 1); SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); }); return x } No memory leak

Slide 44

Slide 44 text

SEXP fun() { std::string str("foo"); cpp11::unwind_protect([&] { SEXP x = Rf_allocVector(STRSXP, 1); SET_STRING_ELT(x, 0, Rf_mkChar(str.c_str())); }); return x } No memory leak

Slide 45

Slide 45 text

SEXP fun() { std::string str("foo"); SEXP x = cpp11::safe[Rf_allocVector](STRSXP, 1); SET_STRING_ELT(x, 0, cpp11::safe[Rf_mkChar](str.c_str())); return x } No memory leak

Slide 46

Slide 46 text

SEXP fun() { std::string str("foo"); SEXP x = cpp11::safe[Rf_allocVector](STRSXP, 1); SET_STRING_ELT(x, 0, cpp11::safe[Rf_mkChar](str.c_str())); return x } No memory leak

Slide 47

Slide 47 text

SEXP fun() { std::string str("foo"); SEXP x = cpp11::safe[Rf_allocVector](STRSXP, 1); SET_STRING_ELT(x, 0, cpp11::safe[Rf_mkChar](str.c_str())); return x } No memory leak

Slide 48

Slide 48 text

Unicode Photo by Henry & Co. on Unsplash

Slide 49

Slide 49 text

Getting unicode right is HARD!

Slide 50

Slide 50 text

Getting unicode right is HARD!

Slide 51

Slide 51 text

Getting unicode right is HARD!

Slide 52

Slide 52 text

Getting unicode right is HARD!

Slide 53

Slide 53 text

Rcpp::sourceCpp(code = ' #include "Rcpp.h" // [[Rcpp::export]] std::string do_nothing_rcpp(std::string x) { return x; }')

Slide 54

Slide 54 text

x <- "fa\xE7ile" Encoding(x) <- "latin1" do_nothing_rcpp(x) #> [1] "façile"

Slide 55

Slide 55 text

x <- "fa\U00E7ile" do_nothing_rcpp(x) #> [1] "façile"

Slide 56

Slide 56 text

Pit of success Photo by Tim Gouw on Unsplash

Slide 57

Slide 57 text

- utf8everywhere.org, Pavel Radzivilovsky, Yakov Galka and Slava Novgorodov “(UTF-8) should be the default choice of encoding for storing text strings in memory or on disk, for communication and all other uses. (this) improves performance, reduces complexity of software and helps prevent many Unicode-related bugs.”

Slide 58

Slide 58 text

– https://cpp11.r-lib.org/articles/motivations.html “text coming from R is always translated into UTF-8. text coming from C++ is always marked as UTF-8."

Slide 59

Slide 59 text

cpp11::cpp_source(code = ' #include [[cpp11::register]] std::string do_nothing_cpp(std::string x) { return x; }')

Slide 60

Slide 60 text

SEXP f(SEXP x) { std::string str; str.reserve(Rf_xlength(STRING_ELT(x, 0))); void* vmax = vmaxget(); str.assign(Rf_translateCharUTF8(STRING_ELT(x, 0))); vmaxset(vmax); api.do_stuff(str); } To C++

Slide 61

Slide 61 text

From C++ SEXP f(SEXP x) { std::string data = api.get_data(); return Rf_ScalarString(Rf_mkCharLenCE(data.c_str(), data.size(), CE_UTF8)); }

Slide 62

Slide 62 text

x <- "fa\xE7ile" Encoding(x) <- "latin1" do_nothing_cpp(x) #> [1] "façile"

Slide 63

Slide 63 text

x <- "fa\U00E7ile" do_nothing_cpp(x) #> [1] "façile"

Slide 64

Slide 64 text

ALTREP Alternative representation R 3.5+ Where's the data? Used in base R Used in Photo by Danilo Santos on Unsplash

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

ALTREP aware sum_vec <- function(x) { sum <- 0 for (val in x) { sum <- sum + val } sum } x <- 1:10000 sum_vec(x) #> [1] 50005000 .Internal(inspect(x)) #> @105a00cf8 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 67

Slide 67 text

ALTREP aware sum_vec <- function(x) { sum <- 0 for (val in x) { sum <- sum + val } sum } x <- 1:10000 sum_vec(x) #> [1] 50005000 .Internal(inspect(x)) #> @105a00cf8 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 68

Slide 68 text

ALTREP aware sum_vec <- function(x) { sum <- 0 for (val in x) { sum <- sum + val } sum } x <- 1:10000 sum_vec(x) #> [1] 50005000 .Internal(inspect(x)) #> @105a00cf8 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 69

Slide 69 text

Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B

Slide 70

Slide 70 text

Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B

Slide 71

Slide 71 text

Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B

Slide 72

Slide 72 text

Not ALTREP aware // [[Rcpp::export]] int sum_vec_rcpp(IntegerVector x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_rcpp(x) #> [1] 50005000 .Internal(inspect(x)) #> @10600fe40 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B

Slide 73

Slide 73 text

Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum = 0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B

Slide 74

Slide 74 text

Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum = 0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B

Slide 75

Slide 75 text

Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum = 0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B

Slide 76

Slide 76 text

Not ALTREP aware SEXP sum_vec_c(SEXP x) { int sum = 0; int* p = INTEGER(x); for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + p[i]; } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c(x) #> [1] 50005000 .Internal(inspect(x)) #> @1351f7858 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (expanded) lobstr::obj_sizes(x) #> 40,728 B

Slide 77

Slide 77 text

ALTREP aware SEXP sum_vec_c2(SEXP x) { int sum = 0; for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + INTEGER_ELT(x, i); } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c2(x) #> [1] 50005000 .Internal(inspect(x)) #> @103e50f80 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 78

Slide 78 text

ALTREP aware SEXP sum_vec_c2(SEXP x) { int sum = 0; for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + INTEGER_ELT(x, i); } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c2(x) #> [1] 50005000 .Internal(inspect(x)) #> @103e50f80 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 79

Slide 79 text

ALTREP aware SEXP sum_vec_c2(SEXP x) { int sum = 0; for (R_xlen_t i = 0; i < Rf_xlength(x); ++i) { sum = sum + INTEGER_ELT(x, i); } return Rf_ScalarInteger(sum); } x <- 1:10000 sum_vec_c2(x) #> [1] 50005000 .Internal(inspect(x)) #> @103e50f80 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 80

Slide 80 text

ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0; int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 81

Slide 81 text

ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0; int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 82

Slide 82 text

ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0; int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 83

Slide 83 text

ALTREP aware SEXP sum_vec_c3(SEXP x) { int sum = 0; int buf[1024]; R_xlen_t i = 0; R_xlen_t n = Rf_xlength(x); while(i < n - 1024) { INTEGER_GET_REGION(x, i, 1024, buf); for (R_xlen_t j = 0; j < 1024; ++j) { sum = sum + buf[j]; } i += 1024; } R_xlen_t extra = n - i; INTEGER_GET_REGION(x, i, extra, buf); for (R_xlen_t j = 0; j < extra; ++j) { sum = sum + buf[j]; } return Rf_ScalarInteger(sum); } } x <- 1:10000 sum_vec_c3(x) #> [1] 50005000 .Internal(inspect(x)) #> @1342a4be0 13 INTSXP g0c0 #> [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 84

Slide 84 text

function time sum_vec_c 374µs sum_vec_c2 625µs sum_vec_c3 186µs

Slide 85

Slide 85 text

Pit of success Photo by Christian Bowen on Unsplash

Slide 86

Slide 86 text

ALTREP aware int sum_vec_cpp11(integers x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_cpp11(x) #> [1] 50005000 .Internal(inspect(x)) #> @134b695f0 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 87

Slide 87 text

ALTREP aware int sum_vec_cpp11(integers x) { int sum = 0; for (auto val : x) { sum = sum + val; } return sum; } x <- 1:10000 sum_vec_cpp11(x) #> [1] 50005000 .Internal(inspect(x)) #> @134b695f0 13 INTSXP g0c0 [REF(65535)] 1 : 10000 (compact) lobstr::obj_sizes(x) #> 680 B

Slide 88

Slide 88 text

✅ Semantics Safety Unicode ALTREP cpp11.r-lib.org C++11 simple(r) cheaper | compilation | appending | protection header only vendor-able  @jimhester_