Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Biclustering

MunichDataGeeks
November 02, 2016

 Biclustering

As big data has become standard in many application areas, challenges have arisen related to methodology and software development, including how to discover meaningful patterns in the vast amounts of data. Addressing these problems, we will show how to apply biclustering methods to find local patterns in a big data matrix.

The talk presents an overview of data analysis using biclustering methods from a practical point of view. Real case studies illustrate the use of several biclustering methods. The mathematical theory of one or two methods will be shown in detail and references to technical details of the methods are provided.

MunichDataGeeks

November 02, 2016
Tweet

More Decks by MunichDataGeeks

Other Decks in Science

Transcript

  1. Biclustering: Methods, Software and Application Dr. Sebastian Kaiser Institut f¨

    ur Statistik Ludwig-Maximilians-Universit¨ at M¨ unchen Munich Datageeks Meetup , 02. November 2016
  2. Outline I. Biclustering II. R Package biclust III. Application IV.

    Summary and Outlook Biclustering: Methods, Software and Application 1
  3. Biclustering Initial Situation : A n × m matrix A

    y1 . . . yi . . . ym x1 a11 . . . ai1 . . . am1 . . . . . . ... . . . ... . . . xj a1j . . . aij . . . amj . . . . . . ... . . . ... . . . xn a1n . . . ain . . . amn with objects X, variables Y and entries aij. Biclustering: Methods, Software and Application 3
  4. Biclustering Goal: A ∗ ∗ A ∗ A ∗ ∗

    ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ A ∗ ∗ A ∗ A ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ A ∗ ∗ A ∗ A ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ⇒ A A A ∗ ∗ ∗ ∗ A A A ∗ ∗ ∗ ∗ A A A ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Find subgroups AIJ of objects I = {i1, ..., ik}, k ≤ n, I ⊂ X which are as similar as possible to each other on a subset of variables J = {j1, ..., jl}, l ≤ m, J ⊂ Y and as different as possible to the remaining objects and variables. Biclustering: Methods, Software and Application 4
  5. Biclustering Why Biclustering: • Clustering on whole data set impossible

    or leads to diffuse results (E.g. Too many uncorrelated variables) • Assumed or expected subgroups in the data (E. g. Some objects have ’similar’ patterns for a given set of variables) Biclustering: Methods, Software and Application 5
  6. Biclustering Typical Application Fields: microarray gene expression data: objects =

    genes, variables = conditions or experiments marketing data: objects = customers or consumers, variables = product features text mining: objects = documents, variables = words Biclustering: Methods, Software and Application 6
  7. Biclustering Bicluster Types (Madeira et al.,2004) 1. Bicluster with constant

    values: aij = µ 2. Bicluster with constant values on rows or columns : (aij = µ+αi or aij = µ∗αi) and (aij = µ+βj or aij = µ∗βj) 3. Bicluster with coherent values: aij = µ + αi + βj or aij = µ ∗ αi ∗ βj 4. Bicluster with coherent evolutions. aih ≤ air ≤ ait ≤ aid or ahj ≤ arj ≤ atj ≤ adj Biclustering: Methods, Software and Application 7
  8. Biclustering constant values − overall 1.0 1.0 1.0 1.0 1.0

    1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 constant values − rows 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 constant values − columns 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 coherent values − additive 1.0 2.0 5.0 0.0 2.0 3.0 6.0 1.0 4.0 5.0 8.0 3.0 5.0 6.0 9.0 4.0 coherent values − multiplicative 1.0 2.0 0.5 1.5 2.0 4.0 1.0 3.0 4.0 8.0 2.0 6.0 3.0 6.0 1.5 4.5 coherent evolution − overall S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 coherent evolution − rows S1 S1 S1 S1 S2 S2 S2 S2 S3 S3 S3 S3 S4 S4 S4 S4 coherent evolution − columns S1 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 S4 Biclustering: Methods, Software and Application 8
  9. Biclustering Relative Bicluster Structures (Madeira et al.,2004) 1. Single bicluster.

    2. Exclusive row and column bicluster. 3. Exclusive-rows or exclusive-columns bicluster. 4. Non-overlapping non-exclusive bicluster. 5. Arbitrarily positioned overlapping bicluster. Biclustering: Methods, Software and Application 9
  10. Biclustering Find more than one bicluster: Most bicluster algorithms are

    iterative. To find the next bicluster given n-1 found bicluster you have to either • ignore (or lock) the n-1 already found bicluster, • delete rows and/or columns of the found bicluster or • mask the found bicluster with random values. Biclustering: Methods, Software and Application 11
  11. Biclustering Jaccard Index (Jaccard, 1901) Two single Bicluster: jac(BCi, BCj)

    = jacij = |BCi ∩ BCj| |BCi ∪ BCj| . Non overlapping bicluster sets: jac(Bicres1, Bicres2) = 1 g g i=1 t j=1 ( |BCi(Bicres1) ∩ BCj(Bicres2)| |BCi(Bicres1) ∪ BCj(Bicres2)| ) Overlapping bicluster sets: jacc(Bicres1, Bicres2) = jac(Bicres1, Bicres2) max(jac(Bicres1, Bicres1); jac(Bicres2, Bicres2)) Biclustering: Methods, Software and Application 12
  12. Biclustering Bicluster Algorithms Chosen sample of algorithms in order to

    cover most bicluster outcomes: Bimax(Barkow et al., 2006): Groups with ones in binary matrix CC (Cheng and Church, 2000): Constant values Plaid (Turner et al., 2005): Constant values over rows or columns SV4D (Sill et al., 2011): Coherent values over rows and columns Xmotif (Murali and Kasif, 2003): Coherent correlation over rows and columns Other Algorithms: Spectral, Isa, Fabian, Quest, ... Biclustering: Methods, Software and Application 13
  13. Bicluster Algorithms Bimax (Barkow et al., 2006) Finds subgroups in

    a binary matrix where all entries are one: 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 ⇒ 1 1 1 1 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Biclustering: Methods, Software and Application 14
  14. Bicluster Algorithms Plaid (Turner et al., 2005) Finds subgroup with

    the highest layer sum of Squares (LSS) by fitting layers k to the model Yij = (µ0 + αi0 + βj0) + K k=1 (µk + αik + βjk)ρikκjk + εij. (1) µ, α, β represent mean, row and column effects and ρ and κ identify if a row or column is member of the layer, respectively. Finds constant values over rows or columns. Biclustering: Methods, Software and Application 15
  15. Bicluster Algorithms SV4D (Sill, Kaiser, Benner and Kopp, 2011) •

    Improvement of SSVD Biclustering from Lee at al.(2010). • Bicluster are found using a sparse singular value decomposi- tion. • Stability selection is used to retrieve a more sable result. • Finds a possible overlapping structures. Biclustering: Methods, Software and Application 16
  16. Bicluster Algorithms The Ensemble Method for Biclustering • Improve stability

    and reliability of bicluster results • Three steps necessary: 1. Initialization Step: Choose starting parameters 2. Combination Step: Combine bicluster results 3. Result Step: Present result set Biclustering: Methods, Software and Application 17
  17. R Package biclust R Package biclust (Kaiser et al. 2011):

    • Whole bicluster process implemented. • Gui’s available (biclustGui, ExpressionViewer, rattle) • Interface to other bicluster packages (isa2, eisa, fabian, sv4d) Biclustering: Methods, Software and Application 19
  18. R Package biclust Function: biclust The main function of the

    package is biclust(data,method=BCxxx(),number,...) with: data: The preprocessed data matrix method: The algorithm used (E. g. BCCC() for CC) number: The maximum number of bicluster to search for ... : Additional parameters of the algorithms Returns an object of class Biclust for uniform treatment. Biclustering: Methods, Software and Application 20
  19. R Package biclust Class: Biclust The bicluster results are presented

    as a S4-class Biclust with 5 slots: Parameters: Saved Call RowxNumber: RowxNumber[i,j] is TRUE if row i is in bicluster j NumberxCol: NumberxCol[i,j] is TRUE if bicluster i contains column j Number: Number of bicluster found info: Additional Information Biclustering: Methods, Software and Application 21
  20. R Package biclust Ensemble Method ensemble(data, confs = plaid.grid(), rep

    = 20, maxNum = 5, similar = jaccard2, thr = 0.8, simthr = 0.7, subs = c(1, 1), bootstrap = FALSE, support = 0, combine = qt, ...) with: conf: Configuration function similar: Similarity Measure bootstrap: Should bootstrap be used. combine: Combine function ... : Additional parameters of the algorithms Biclustering: Methods, Software and Application 22
  21. R Package biclust Heatmap Plots drawHeatmap(x, Bicres, ...) Bicluster 1

    (size 23 x 12 ) diurnal_04h.CEL diurnal_08h.CEL diurnal_12h.CEL diurnal_16h.CEL diurnal_20h.CEL cell_cycle_aph_6h cell_cycle_aph_8h cell_cycle_aph_10h cell_cycle_aph_12h cell_cycle_aph_14h cell_cycle_aph_16h cell_cycle_aph_19h 247474_at 263549_at 252011_at 254250_at 254746_at 259773_at 255822_at 260221_at 252965_at 259790_s_at 257506_at 259783_at 259787_at 261772_at 249645_at 247055_at 250366_at 262970_at 258757_at 266222_at 250801_at 266965_at 262883_at Biclustering: Methods, Software and Application 23
  22. R Package biclust Parallel Coordinates parallelCoordinates(x, Bicres, ...) Rows Value

    247474_at 259773_at 257506_at 247055_at 250801_at −4 −2 0 2 4 Bicluster 1 (rows= 23 ; columns= 12 ) Rows Value 248683_at 263497_at 263495_at 255943_at 257057_at −4 −2 0 2 4 6 Bicluster 1 (rows= 30 ; columns= 21 ) Rows Value 247474_at 259773_at 257506_at 247055_at 250801_at 0 1 2 3 Bicluster 1 (rows= 23 ; columns= 12 ) Rows Value 248683_at 263497_at 263495_at 255943_at 257057_at −0.6 −0.4 −0.2 0.0 0.2 0.4 Bicluster 1 (rows= 30 ; columns= 21 ) Biclustering: Methods, Software and Application 24
  23. R Package biclust Membership Plot biclustmember(bicResult, x, ...) 1 2

    3 Var. 1 Var. 2 Var. 3 Var. 4 Var. 5 Var. 6 Var. 7 Var. 8 Var. 9 Var. 10 Var. 11 Var. 12 Var. 13 Var. 14 Var. 15 Var. 16 Var. 17 Var. 18 Var. 19 Var. 20 Var. 21 Var. 22 Var. 23 Var. 24 Var. 25 Var. 26 Var. 27 Var. 28 Var. 29 Var. 30 Var. 1 Var. 2 Var. 3 Var. 4 Var. 5 Var. 6 Var. 7 Var. 8 Var. 9 Var. 10 Var. 11 Var. 12 Var. 13 Var. 14 Var. 15 Var. 16 Var. 17 Var. 18 Var. 19 Var. 20 Var. 21 Var. 22 Var. 23 Var. 24 Var. 25 Var. 26 Var. 27 Var. 28 Var. 29 Var. 30 Bicluster Unsorted Cluster 1 2 3 Var. 1 Var. 2 Var. 3 Var. 4 Var. 5 Var. 6 Var. 7 Var. 8 Var. 9 Var. 10 Var. 11 Var. 12 Var. 13 Var. 14 Var. 15 Var. 16 Var. 17 Var. 18 Var. 19 Var. 20 Var. 21 Var. 22 Var. 23 Var. 24 Var. 25 Var. 26 Var. 27 Var. 28 Var. 29 Var. 30 Var. 1 Var. 2 Var. 3 Var. 4 Var. 5 Var. 6 Var. 7 Var. 8 Var. 9 Var. 10 Var. 11 Var. 12 Var. 13 Var. 14 Var. 15 Var. 16 Var. 17 Var. 18 Var. 19 Var. 20 Var. 21 Var. 22 Var. 23 Var. 24 Var. 25 Var. 26 Var. 27 Var. 28 Var. 29 Var. 30 Cluster Unsorted Cluster 1 2 3 Var. 30 Var. 28 Var. 27 Var. 24 Var. 23 Var. 21 Var. 19 Var. 16 Var. 13 Var. 12 Var. 11 Var. 8 Var. 6 Var. 2 Var. 1 Var. 17 Var. 15 Var. 14 Var. 10 Var. 3 Var. 29 Var. 25 Var. 20 Var. 5 Var. 4 Var. 26 Var. 22 Var. 18 Var. 9 Var. 7 Var. 30 Var. 28 Var. 27 Var. 24 Var. 23 Var. 21 Var. 19 Var. 16 Var. 13 Var. 12 Var. 11 Var. 8 Var. 6 Var. 2 Var. 1 Var. 17 Var. 15 Var. 14 Var. 10 Var. 3 Var. 29 Var. 25 Var. 20 Var. 5 Var. 4 Var. 26 Var. 22 Var. 18 Var. 9 Var. 7 Bicluster Sorted Cluster 1 2 3 Var. 30 Var. 28 Var. 27 Var. 24 Var. 23 Var. 21 Var. 19 Var. 16 Var. 13 Var. 12 Var. 11 Var. 8 Var. 6 Var. 2 Var. 1 Var. 17 Var. 15 Var. 14 Var. 10 Var. 3 Var. 29 Var. 25 Var. 20 Var. 5 Var. 4 Var. 26 Var. 22 Var. 18 Var. 9 Var. 7 Var. 30 Var. 28 Var. 27 Var. 24 Var. 23 Var. 21 Var. 19 Var. 16 Var. 13 Var. 12 Var. 11 Var. 8 Var. 6 Var. 2 Var. 1 Var. 17 Var. 15 Var. 14 Var. 10 Var. 3 Var. 29 Var. 25 Var. 20 Var. 5 Var. 4 Var. 26 Var. 22 Var. 18 Var. 9 Var. 7 Cluster Sorted Cluster Biclustering: Methods, Software and Application 25
  24. R Package biclust Additional methods Preprocessing: normalize.loess(), prequest(), ... Visualization:

    Bubbleplot(), barchart(), ... Validation: jaccardind(), constantVariance(), randind(), ... Little Helpers: bicluster(), writeclust(), ... Biclustering: Methods, Software and Application 26
  25. Application in Marketing Australian Tourism Survey (Dolnicar et al., 2011)

    • Survey of the Faculty of Commerce, University of Wollongong • Questions on activities during the holidays • 1003 tourist • 45 Activity Questions (Binary) Biclustering: Methods, Software and Application 28
  26. Application in Marketing Results: • Introduce biclustering to Marketing. •

    Adaption of Bimax algorithm for segmentation. • 11 bicluster found (Sizes vary from 80 to 50 tourists). • Segments differ significantly in a number of sociodemographic and behavioral variables. • For example, segments differ in the number of domestic holidays they take per year Biclustering: Methods, Software and Application 29
  27. Application in Marketing 1 2 3 4 5 6 7

    8 9 10 11 Cultural Theatre Casino Festivals GuidedTours CharterBoat Whale childrenAtt ThemePark wildlife Industrial Farm Bushwalk Museum Monuments Gardens Movies BBQ Friends EatingHigh Pubs Swimming Beach FourWhieel Camping Hiking Fishing Golf Tennis SportEvent Spa Exercising Riding Skiing Adventure Surfing Cycling ScubaDiving WaterSport Eating Shopping Relaxing Sightseeing ScenicWalks Markets Cultural Theatre Casino Festivals GuidedTours CharterBoat Whale childrenAtt ThemePark wildlife Industrial Farm Bushwalk Museum Monuments Gardens Movies BBQ Friends EatingHigh Pubs Swimming Beach FourWhieel Camping Hiking Fishing Golf Tennis SportEvent Spa Exercising Riding Skiing Adventure Surfing Cycling ScubaDiving WaterSport Eating Shopping Relaxing Sightseeing ScenicWalks Markets Segment Biclustering: Methods, Software and Application 30
  28. Application in Sports Major League Baseball Source: http:\www.mlb.com Season 2009

    Data: 631 players and 28 performance stats (Hitting) Recoding: 1 (lowest 10%) to 10 (highest 10%) Results: Count: 5 Biclusters identified Cluster C: Power Hitters Cluster E: Workhorses (no Catchers) Biclustering: Methods, Software and Application 31
  29. Application in Sports OPS GO.AO SB. AVG SLG OBP AO

    GO XBH NP TPA GDP IBB HBP SH SF CS SB SO BB TB RBI HR X3B X2B H R AB q q q q q A 2 4 6 8 10 q q q B q q q q q q q q q C OPS GO.AO SB. AVG SLG OBP AO GO XBH NP TPA GDP IBB HBP SH SF CS SB SO BB TB RBI HR X3B X2B H R AB 2 4 6 8 10 q q q q q q D q q q E Population mean: q Segmentwise means: in bicluster outside bicluster Biclustering: Methods, Software and Application 32
  30. Summary and Outlook Future Researches Methods: • Correction for missing

    data • Correction for measurement error • Validation methods Software: • Keep track with new developments. • Speed up algorithms (Parallel Computing) • Python! Applications: • Next generation sequencing • Soccer player scouting • NBA Performance Biclustering: Methods, Software and Application 34
  31. New Bicluster R Book Applied Biclustering Methods for Big and

    High-Dimensional Data Using R Adetayo Kasim, Ziv Shkedy, Sebastian Kaiser, Sepp Hochreiter, Willem Talloen Chapman and Hall/CRC ISBN 9781482208238 Series: Chapman & Hall/CRC Biostatistics Biclustering: Methods, Software and Application 35
  32. Acknowledgments The package biclust is a joint work with Microarray

    Analysis and Visualization Effort, University of Salamanca, Spain, especially Rodrigo Santamaria. Marketing examples are from a joint work with Katie Lazarevski and Prof. Sara Dolnicar from the School of Management and Marketing of the University of Wollongong in Australia. s4vd is a joint work with the DKFZ in Heidelberg especially Martin Sill. biclustGui and the goodness of fit statistics are joint works with the bicluster group in Hasselt, Belgium. Biclustering: Methods, Software and Application 36
  33. References Ensemble Methods: Pfundstein (2010). Ensemble Methods for Plaid Bicluster

    Algorithm. Bachelor Thesis, 2010. Ordinal Values: Kaiser, Tr¨ ager, and Leisch (2011). Generating Correlated Ordinal Random Values. Technical Report, 2011. Software: Kaiser and Leisch (2008). A Toolbox for Bicluster Analy- sis in R. Compstat 2008—Proceedings in Computational Statistics, 55(3), pages 201–208, 2008. Sill, Kaiser, Benner, and Kopp-Schneider (2011). Robust biclustering by sparse singular value decomposition incor- porating stability selection. Accepted for Bioinformatics, 2011. Khamiakova, Kaiser, and Shkedy (2011). Goodness- to-Fit and Diagnostic Tools Within the Differential Co- expression and Biclusters Setting. Unpublished, 2011. Application: Dolnicar, Kaiser, Lazarevski, and Leisch (2010). BI- CLUSTERING Overcoming data dimensionality problems in market segmentation. Journal of Travel Research, 2011. R Packages: biclust: BiCluster Algorithms. R package version 1.0.1 http://cran.r-project.org/package=biclust. Biclustering: Methods, Software and Application 37
  34. References BARKOW, S., BLEULER, S., PRELIC, A., ZIMMERMANN, P., and

    ZITZLER, E. (2006): Bicat: a biclustering analysis toolbox. Bioinformatics, 22,1282–1283. CHENG, Y. and CHURCH, G. M. (2000): Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, 1,93–103. HARTIGAN, J.A. (1972): Direct Clustering of a data matrix. Journal of The American Statistical Association, 67,12079–12084. HEYER, L. J., Kruglyak, S. and Yooseph, S. (1999). Exploring Expression Data: Identification and Analysis of Coexpressed Genes. Genome Research 9 (11), 1106-1115. JACCARD, P. (1901). Distribution de laflore alpine dans le bassin des dranses et dans quelques regions voisines. Bulletin de la Sociate Vaudoise des Sciences Naturelles 37, 241-272. KAISER, S. and LEISCH, F. (2008): A Toolbox for Bicluster Analysis in R. In: Compstat 2008—Proceedings in ComputationalStatistics, Paula Brito, Physica Verlag, Heidelberg, Germany. Biclustering: Methods, Software and Application 38
  35. References KLUGER, Y., BASRI, R., CHANG, J. T., and GERSTEIN,

    M. (2003): Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Research, 13,703–716. LAWRENCE, H. and ARABIE, P. (1985): Comparing partitions. Journal of Classification 2 (1), 193-21. LAZZERONI, L. and OWEN, A. (2002): Plaid models for gene expression data. Statistica Sinica, 12,61–86. LEE, M., SHEN, H., Huang, J. Z. and Marron, J. S. (2010, Feb). Biclustering via sparse singular value decomposition. Biometrics. MADEIRA, S. C. and OLIVEIRA, A. L. (2004): Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1),24–45. VAN MECHELEN, I. and SCHEPERS, J. (2006): A unifying model for biclustering. In: Compstat 2006 - Proceedings in Computational Statistics, 81–88. MURALI, T. and KASIF, S. (2003): Extracting conserved gene expression motifs from gene expression. In: Pacific Symposium on Biocomputing, 8,77– 88. Biclustering: Methods, Software and Application 39
  36. References PRELIC, A., BLEULER, S., ZIMMERMANN, P., WIL, A., B¨

    UHLMANN, P., GRUISSEM, W., HENNING, L., THIELE, L., and ZITZLER, E. (2006): A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 22(9),1122–1129. R Development Core Team (2011). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0. SANTAMARIA, R., THERON, R., and QUINTALES, L. (2007): A framework to analyze biclustering results on microarray experiments. In: 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL’07) ,Springer, Berlin, 770–779. SCHARL, T. and LEISCH, F. (2006). The stochastic qt-clust algorithm: Evaluation of stability and variance on time-course microarray data. In A. Rizzi and M. Vichi (Eds.), Compstat 2006 Proceedings in Computational Statistics, pp. 1015-1022. Physica Verlag, Heidelberg, Germany. TURNER, H., BAILEY, T., and KRZANOWSKI, W. (2005): Improved biclustering of microarray data demonstrated through systematic performance tests. Computational Statistics and Data Analysis, 48,235–254. Biclustering: Methods, Software and Application 40
  37. Biclustering: Methods, Software and Application Thank you for your attention.

    See http://cran.r-project.org/package=biclust/ for the official release (biclust 1.2.0), http://r-forge.r-project.org/projects/biclust/ for the newest developments (biclust 1.2.1) and http://www.statistik.lmu.de/~kaiser/bicluster.html for Papers and Links. Biclustering: Methods, Software and Application 41
  38. Biclustering: Methods, Software and Application Nomenclature • (X, Y )

    is an n × m Matrix A • AIJ is a subset with I = {i1, ..., ik} (k ≤ n,I ⊂ X) and J = {j1, ..., jl} (l ≤ m,J ⊂ Y ) • Bicluster BCz = (Iz, Jz) = AIzJz • Bicluster Result Set Bicresw = {BC1, ..., BCN} Biclustering: Methods, Software and Application 42
  39. Biclustering: Methods, Software and Application Calculation of a bicluster result:

    > set.seed(1234) > rescc <- biclust(BicatYeast, method = BCCC(), alpha = 1.5, delta = 0.3) > class(rescc) [1] "Biclust" attr(,"package") [1] "biclust" > str(rescc) Formal class ’Biclust’ [package "biclust"] with 5 slots ..@ Parameters:List of 2 .. ..$ Call : language biclust(x = BicatYeast, method = BCCC(), alpha = 1.5, delta = 0.3, number = 50) .. ..$ Method:Formal class ’BCCC’ [package "biclust"] with 1 slots .. .. .. ..@ biclustFunction:function (x, delta = 1, alpha = 1.5, number = 100) ..@ RowxNumber: logi [1:100, 1:10] FALSE FALSE FALSE FALSE FALSE TRUE ... ..@ NumberxCol: logi [1:10, 1:50] FALSE FALSE FALSE TRUE TRUE FALSE ... ..@ Number : num 10 ..@ info :List of 1 Biclustering: Methods, Software and Application 43
  40. Biclustering: Methods, Software and Application Print a bicluster result: >

    rescc An object of class Biclust call: biclust(x = BicatYeast, method = BCCC(), alpha = 1.5, delta = 0.3, number = 50) Number of Clusters found: 10 First 5 Cluster sizes: BC 1 BC 2 BC 3 BC 4 BC 5 Number of Rows: 18 14 12 12 10 Number of Columns: 14 15 15 12 12 or summary(rescc) for all cluster sizes. Biclustering: Methods, Software and Application 44
  41. Jobs Data:LAB @ Volkswagen AG WE ARE HIRING Contact: Veronique

    Ruhrmann [email protected] Sebastian Kaiser [email protected] Biclustering: Methods, Software and Application 45
  42. Biclustering: Methods, Software and Application Applied Biclustering Methods for Big

    and High-Dimensional Data Using R Adetayo Kasim, Ziv Shkedy, Sebastian Kaiser, Sepp Hochreiter, Willem Talloen Chapman and Hall/CRC ISBN 9781482208238 Series: Chapman & Hall/CRC Biostatistics Biclustering: Methods, Software and Application 46