Upgrade to Pro — share decks privately, control downloads, hide ads and more …

R and C++. Past Present and Future.

Fff754c1dbef7ed51f3615efcccf9be7?s=47 Romain
April 05, 2017

R and C++. Past Present and Future.

R and C++. Past Present and Future. Presented at the MilanoR meetup on 2017-04-05

Fff754c1dbef7ed51f3615efcccf9be7?s=128

Romain

April 05, 2017
Tweet

Other Decks in Technology

Transcript

  1. R and C++ Past, Present and Future Romain François Consulting

    Datactive romain@thinkr.fr @romain_francois
  2. None
  3. None
  4. None
  5. Training / Development / Support @thinkR_fr

  6. Rcpp

  7. devtools::revdep( "Rcpp") ABCoptim AbsFilterGSEA acc accelerometry acebayes ACEt acrt AdaptiveSparsity

    ADMMnet AhoCorasickTrie AHR alakazam algstat AlignStat ALKr alphabetr Amelia anytime apcluster arrApply ashr ASPBay aSPU attrCUSUM autovarCore BaBooN BacArena baitmet BalancedSampling BAMBI BAMMtools Barycenter BatchMap batman bayesAB BayesBD BayesComm bayesDP BayesFactor BayesianTools bayesImageS bayesm bayou bcp bcpa bcROCsurface BCSub bea.R beanz BEDMatrix benchr BeviMed beyondWhittle bfa bifactorial bife BIFIEsurvey biganalytics bigFastlm biglasso bigmemory BigQuic bigReg bigtabulate BigVAR bindrcpp bio3d Biocomb BIPOD biwavelet blackbox blockcluster blockmodels blockseg BLPestimatoR bmlm bnnSurvival BNSL bootTimeInference bridgesampling brms bsearchtools btb btf BTLLasso BTR BTYDplus BuyseTest bvarsv bWGR BWStest CARBayes CARBayesST catlearn ccaPP cccp ccdrAlgorithm cda CDF.PSIdekick CDM CFC cgAUC ChannelAttribution chopthin CIDnetworks CircularDDM cIRT cladoRcpp classifierplots classify cleanEHR clere climdex.pcic clogitboost clogitL1 clusrank ClusterR ClusterStability clusteval ClustMMDD ClustVarLV CMF CNull coala CoFRA collUtils combiter CompGLM ConConPiWiFun coneproj contoureR copCAR cord CorReg Countr Coxnet CoxPlus cpgen cpr cqrReg crawl creditr crmPack Crossover cstab ctmcd cubature CVR cxxfunplus cycleRtools Cyclops D3M darch dbmss dbscan ddalpha ddR DDRTree deepboost DeLorean densityClust DEploid DepthProc DescTools DetMCD DetR devtools dfcomb dfmta dfphase1 dfpk diagis DiffNet diffrprojects DiffusionRgqd DiffusionRimp DiffusionRjgqd diffusr dils dina disclapmix discretecdAlgorithm diversitree diveRsity divest dlib DLMtool DNAtools dnc dplyr drgee dslice DStree dtwclust dynamichazard dynsbm easyVerification EBMAforecast ECctmc ecp EditImputeCont eDMA eggCounts eive EloChoice EMbC emil emIRT energy EPGLM erah esaddle ESGtoolkit EstHer ETAS eulerr evolqg EWGoF exif extraDistr factorcpt FactoRizationMachines factorstochvol fastAdaboost FastBandChol fastcmh fasteraster fastGHQuad FastGP FastHCS fastHorseshoe fastJT fastM FastPCS FastRCS FBFsearch fbroc FCNN4R fdaMixed fdapace fdasrvf FDGcopulas FDRreg feather FeatureHashing ffstream FIACH fICA filesstrings FisHiCal FIT flam flan flars flexsurv flip flock FLSSS forecast forecastSNSTS forega forestFloor fourierin fourPNO fractional frailtyEM frailtySurv FRESA.CAD FRK fromo FSelectorRcpp FSInteract fst fugeR FunChisq Funclustering futureheatwaves fwsim gamreg gapfill GAS gaselect gaston GauPro gbp gcKrig GCPM GDINA gdm gdpc gdtools gee4 GEEaSPU geiger genie GENLIB GenomicTools geoCount geohash geojsonR GERGM ggdmc ggforce ggraph ggrepel GiRaF gjam gkmSVM glamlasso glcm GLMaSPU glmBfp glmgraph glmmsr GMCM Gmedian Gmisc gmum.r gmwm gMWT gogamer googleway GPareto gpuR GPvam gRain graphicalVAR graphkernels GraphKit graphql grattan gRbase gRim grove growcurves growfunctions GSE gsEasy gsynth GUILDS GUTS gwfa GWmodel GxM h5 hashmap haven hawkes hBayesDM HDPenReg hierarchicalSets hit hkevp HLMdiag hmi HSAR hsphase htdp htmltidy htmltools hts httpuv HUM humaniformat humarray hunspell hyperSpec hypervolume hyphenatr IAPWS95 iBATCGH IBHM ibm ibmcraftr iBST icamix ICAOD iccbeta icd icd9 icenReg icensmis icRSF ICtest ie2misc iemisc IHSEP iLaplace imager imagine immer imputeMulti inarmix inca indelmiss iNextPD inferr inline IntegratedMRF interflex ipft iprior iptools IRTpp IsingSampler isoph ISOpureR IsoSpecR iterpc JacobiEigen JAGUAR jiebaR jmcm jmotif joineRML joinXL JOUSBoost jqr JSM jtGWAS jwutil kamila kdecopula kdevine kergp kernDeepStackNet KernelKnn KernSmoothIRT kmc kmeans.ddR Kmisc KODAMA kohonen KoulMde l0ara LaF LambertW lamW LANDD Langevin largeVis LassoBacktracking lasvmR LatentREGpp lbfgs LBSPR lclGWAS lcopula lexRankr lfl lidR lifecontingencies lm.br lme4 lowmemtkmeans lpme lsbclust lsgl lucr ludic Luminescence MADPop mafs magick MAINT.Data ManifoldOptim mapview marked markophylo markovchain MAT matchingMarkets matchingR MatchItSE mateable MatrixCorrelation MatrixLDA MAVE maxent mbbefd mcemGLM mcga mcIRT mcmcse mcPAFit medfate MediaK MEGENA MESS metafolio MetaheuristicFPA meteoland mets mev mgm mice miceadds microclass microseq milr minimaxdesign minqa mirt mirtCAT miscF miscset MiSPU missDeaths MixAll MixedDataImpute mixedMem mixlink mixpack mixR mkde mlmc mlxR mmand ModelMetrics Morpho mousetrap move moveHMM moveWindSpeed mp Mposterior MPTinR mrfDepth mrgsolve MRS MSGARCH msgl MTS multdyn MultiBD multicool multinet MultivariateRandomForest mvabund MVB mvcluster mvnfast mwaved myTAI nabor NAM ndjson ndl NestedCategBayesImpute netcoh netdiffuseR NetRep NetSim NetworkInference neuroim ngspatial NHMM nmfgpu4R NNLM noncompliance nonlinearTseries NPBayesImpute NPflow nprobust nse odbc odeintr oem officer OjaNP olctools OneArmPhaseTwoStudy onlinePCA ontologySimilarity openair OpenImageR OpenMx openxlsx optimization optiSel optmatch opusminer orQA PAC packcircles pacotest padr PAFit palm pander PanelCount partialAR patternplot pcalg pcIRT pdftools pdp pdSpecEst PedCNV pedometrics penalized PenCoxFrail penMSM perccal PerMallows pgee.mixed ph2bayes ph2bye phangorn phonics phybreak phylobase phylocurve PhylogeneticEM phylosignal Pijavski pinbasic pirate plac planar planor PLMIX plogr plotSEMM plyr poisDoubleSamp polyfreqs polywog POUMM PoweR PP PPtreeViz prclust precrec PReMiuM primes ProbitSpatial pROC prodlim ProFit ProNet propagate prophet propr prospectr protolite prototest protViz pryr psd psgp purrr pvar PWD pystr qrencoder QRM quadrupen qualpalr quanteda quantspec QuantTools queuecomputer qVarSel qwraps2 radiomics rags2ridges ragt2ridges ramcmc randomUniformForest ranger Rankcluster rankdist raptr raster Rblpapi Rborist Rcereal Rclusterpp RcppAnnoy RcppAPT RcppArmadillo RcppBDT RcppBlaze RcppCCTZ RcppClassic RcppClassicExamples RcppCNPy RcppDE RcppDL RcppEigen RcppExamples RcppFaddeeva RcppGetconf RcppGSL RcppHMM RcppHoney RcppMLPACK RcppNumerical RcppOctave RcppParallel RcppProgress RcppRedis RcppRoll RcppShark RcppSMC RcppStreams RcppTOML RcppXts RcppZiggurat rcss rdist Rdtq readr readstata13 readxl RealVAMS recexcavAAR recmap reconstructr recosystem redist rEDM regsem ReIns relSim rem remote ReorderCluster repolr resemble reshape2 reticulate revealedPrefs rexpokit Rfast rflann rforensicbatwing rFTRLProximal rgam rgeolocate RI2by2 RInside Rip46 ripa rIsing riskRegression rivr rkvo Rlabkey rlas Rlda Rlibeemd RLRsim RLumModel Rmalschains rmgarch Rmixmod RmixmodCombi rmumps rncl RNifti RNiftyReg robCompositions robets robustgam RobustGaSP robustHD robustlmm robustreg rococo roll rollply rootWishart rotations RoughSets roxygen2 rpg Rphylopars rpms rPref RProtoBuf RPtests RQuantLib rrr RSNNS RSNPset Rsomoclu RSpectra RSQLite RSSL rstan rstanarm RStoolbox rstpm2 rtext rtk rtkore Rtsne Ruchardet rucrdtw rugarch Rvcg rvg Rvoterdistance RVowpalWabbit rwfec rwirelesscom Ryacas s2 saeRobust SAMM satellite saturnin sbart sbfc sBIC sbmSDP sbrl SBSA scales SciencesPo scorer scoringRules scrm scrypt scvxclustr sdcMicro sdcTable secure SEERaBomb segmag seismicRoll SelvarMix semver seqHMM sequences sf sgd sglOptim sharpeRratio signalHsmm simFrame simmer simPop SimReg simstudy sirt sitmo skm slfm SLOPE smam SmartSVA SMMA smoof smooth SnakeCharmR snipEM snowboot snplist SocialNetworks SOD SpaCCr spacodiR spaMM SparseFactorAnalysis sparseHessianFD sparseLTSEigen sparsepp sparsereg spass spatgraphs SpatialEpi SpatialTools SpaTimeClus SpatMCA SpatPCA spBayesSurv spduration SpecsVerification spectral sppmix spray spsann SSL starma staTools stdvectors steadyICA StepwiseTest StereoMorph stlplus stm StMoSim stochvol stocks stosim stplanr stpm strat strataG stream stremr striprtf strum supc SuperRanker survAccuracyMeasures surveillance surveybootstrap survSNP svglite SVMMatch synchronicity synlik synthACS systemicrisk tagcloud TAM TAQMNGR TauStar tbart tcR TDA tensorBSS termstrc TESS tesseract testforDEP text2vec textmineR textreg textreuse textTinyR TFMPvalue tibble tidyr tidyxl timma TLMoments tm tmg tmlenet tnam tokenizers TransferEntropy TreeBUGS treeclim treeplyr treescape treespace triebeard trustOptim tsBSS tvd tweenr UncerIn2 unmarked unsystation urltools V8 validatejsonr valr valuer varband varbvs VarSelLCM vcfR vdiffr velox VideoComparison VIM vita VNM waffect walkr wand wCorr webreadr wicket wingui Wmisc wordcloud wordspace wrswoR wsrf XBRL xml2 xslt xyz yakmoR yCrypticRNAs yuima zic ziphsmm 1001 9.63 %
  8. devtools::revdep( "Rcpp", recursive = TRUE) 7812 75.23 %

  9. *.h 95 235 *.cpp 5 698 100 933 *.R 3

    177
  10. motivation

  11. int add( int a, int b ){ return a +

    b ; } > add( 40, 2 ) [1] 42
  12. past C/R API

  13. #include <R.h> #include <Rinternals.h> int add( int a, int b

    ){ return a + b ; } SEXP add_c( SEXP a_, SEXP b_ ){ int a = INTEGER(a_)[0], b = INTEGER(b_)[0] ; int res = add( a, b ) ; SEXP result = PROTECT(allocVector(INTSXP, 1) ) ; INTEGER(result)[0] = res ; UNPROTECT(1) ; return result ; }
  14. add <- function(a, b){ .Call( "add_c", a, b ) }

    > add( 40, 2 ) Error in add(33, 9) : INTEGER() can only be applied to a 'integer', not a 'double' > add( 40L, 2L ) [1] 42
  15. add <- function(a, b){ .Call( "add_c", as.integer(a), as.integer(b) ) }

    > add( 40, 2 ) [1] 42 > add( 40L, 2L ) [1] 42
  16. Tools • SEXP • INTEGER • PROTECT • allocVector •

    INTSXP • UNPROTECT • .Call • as.integer
  17. present Rcpp

  18. #include <Rcpp.h> // [[Rcpp::export]] int add( int a, int b

    ){ return a + b ; } > add( 17L, 25L ) [1] 42 > add( 17, 25 ) [1] 42
  19. example

  20. None
  21. ! weighted_mean_1 <- function(x, w) { total <- 0 total_w

    <- 0 for (i in seq_along(x)) { total <- total + x[i] * w[i] total_w <- total_w + w[i] } total / total_w }
  22. " weighted_mean_2 <- function(x, w) { sum(x*w) / sum(w) }

  23. # weighted.mean

  24. $ #include <Rcpp.h> using namespace Rcpp ; // [[Rcpp::export]] double

    weighted_mean_cpp( NumericVector x, NumericVector w){ int n = x.size() ; double total = 0.0 ; double total_w = 0.0 ; for( int i=0; i<n; i++){ total += x[i] * w[i] ; total_w += w[i] ; } return total / total_w ; }
  25. None
  26. None
  27. future ?

  28. Rcpp is like usb …

  29. … but we want usb-c

  30. Rcpp is too huge and not modular

  31. Rcpp n + 2 core sugar modules core modules Rcpp

    n +1 core sugar modules core modules Rcpp n core sugar modules core modules % mypkg ' ✔ $ )
  32. proposal

  33. NumericVector IntegerVector CharacterVector Function sugar modules … IntegerVector CharacterVector Function

    sugar NumericVector core core modules … core pkg1 pkg2 core pkg3 core CharacterVector IntegerVector NumericVector
  34. ' ✔ $ ) ✔ ✔

  35. Pros • Smaller • Faster • Robust • Controlled updates

    Cons • More copies of code base • Maybe more testing
  36. None
  37. http://bit.ly/milanorcpp