Class Comparison between Groups of Arrays

BRB-ArrayTools Development Team

2019-10-13

This package is used for comparing two or more pre-defined classes. The classes to be compared are in a vector. The vector can be any set of numerical, character or character string data. If an entry for a particular sample is left blank in the vector, that sample will be omitted from the class comparison analysis.

The Class Comparison between Groups of Arrays package computes a t-test or F-test separately for each gene using the normalized log-ratios for cDNA arrays and the normalized log-intensities for one color oligonucleotide arrays. The F-test is a generalization of the two-sample t-test for comparing values among groups. It has the option of using the random variance version of the t-test or F-test. They provide for sharing information among genes of the within-class variance in log-ratios or log signals. The class comparison function computes the number of genes that are differentially expressed among the classes at the statistical significance level selected in the F-test and creates a gene list containing information about the significant genes.

It implements the Class Comparison between Groups of Arrays tool in BRB-ArrayTools.

Quick Start

This package provides test.classComparison for a quick start of class comparison analysis over one of the built-in sample data (i.e., “Brca”, “Perou”, and “Pomeroy”).

library(classComparison)
res <- test.classComparison("Brca",outputName = "classComparison", generateHTML = TRUE)
names(res)

It outputs an HTML file (C:\Users\YourUserName\Documents\Brca\Output\classComparison\classComparison.html) with class comparison results as well as a list res including the following objects:

## [1] "classifierTable" "workPath"        "outputPath"    

Here we give simple explanation about each object in res:

##    Parametric p-value     FDR Geom mean of  ratios  in BRCA1
## 1             4.6e-06 0.00683                           0.26
## 2             6.8e-06 0.00683                           1.38
## 3            4.88e-05  0.0155                           1.35
## 4            5.12e-05  0.0155                           1.17
## 5            5.52e-05  0.0155                           0.64
## 6            5.66e-05  0.0155                            1.3
## 7            6.96e-05  0.0155                           1.26
## 8            7.18e-05  0.0155                           0.47
## 9            7.86e-05  0.0155                           1.34
## 10           8.36e-05  0.0155                           2.33
## 11           8.62e-05  0.0155                           0.54
## ......
##    Geom mean of  ratios  in BRCA2 Geom mean of  ratios  in Sporadic UniqueID CloneID
## 1                             1.1                              0.71   HV17G6  897781
## 2                            3.47                              1.28    HV8D9   50413
## 3                            4.25                              1.35    LO5C8  345645
## 4                            3.57                              1.99    LO1A7   82991
## 5                            2.19                              1.07   HV25G2  838568
## 6                            3.07                              1.06    HV7B9  666377
## 7                            0.57                              1.11   UG6G12   51209
## 8                            1.52                              1.24   HV5H10  823940
## 9                            0.56                              0.99  HV19C12  784830
## 10                           5.57                               2.1   HV12G4  752631
## 11                           1.24                              0.58    HV2H5  244307
## ......
##      Symbol
## 1      KRT8
## 2      COMT
## 3          
## 4     ENPP1
## 5     COX6C
## 6     VEZF1
## 7    PPP1CB
## 8          
## 9    CDC123
## 10         
## 11 SERPINE1
## ......
##                                                                                                               Name
## 1                                                                                                        Keratin 8
## 2                                                                                     Catechol-O-methyltransferase
## 3  Transcribed locus, weakly similar to XP_002453697.1 hypothetical protein SORBIDRAFT_04g010730 [Sorghum bicolor]
## 4                                                               Ectonucleotide pyrophosphatase/phosphodiesterase 1
## 5                                                                                 Cytochrome c oxidase subunit VIc
## 6                                                                               Vascular endothelial zinc finger 1
## 7                                                           Protein phosphatase 1, catalytic subunit, beta isozyme
## 8                                                                          Homo sapiens, clone IMAGE:4133978, mRNA
## 9                                                                  Cell division cycle 123 homolog (S. cerevisiae)
## 10                                                                                               Transcribed locus
## 11                   Serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1
## ......
##    EntrezID Pairwise significant
## 1      3856       (1, 2), (1, 3)
## 2      1312       (1, 2), (3, 2)
## 3                 (1, 2), (3, 2)
## 4      5167               (1, 2)
## 5      1345       (1, 2), (3, 2)
## 6      7716       (1, 2), (3, 2)
## 7      5500       (2, 1), (2, 3)
## 8                 (1, 2), (1, 3)
## 9      8872       (2, 1), (2, 3)
## 10                (1, 2), (3, 2)
## 11     5054       (1, 2), (3, 2)
## ......

Data Input

classComparison is the main R function to perform class comparison analysis. In this section, we will look into details about how to prepare inputs for classComparison. Once again, we use the “Brca” sample data for an example. The package contains the following “Brca” sample information:

*Brca_LOGRAT.txt : a table of expression data with rows representing genes and columns representing samples;

*Brca_FILTER.TXT: a list of filtering information, where 1 means the corresponding gene passes the filters while 0 means it is excluded from analysis;

*Brca_GENEID.txt: a table of gene information corresponding to row information of Brca_LOGRAT.txt and Brca_FILTER.TXT;

*Brca_EXPDESIGN.txt: a table with class information AND/OR separate test set information.

There are a total of 22 samples in 3 classes for class comparison calculations. We run the following code to obtain objects like exprData as inputs to classComparison.

dataset<-"Brca"
# Gene IDs
geneIds <- read.delim(system.file("extdata", paste0(dataset, "_GENEID.txt"), package = "classComparison"), as.is = TRUE, colClasses = "character") 
# Expression data, and here are log ratio.
x <- read.delim(system.file("extdata", paste0(dataset, "_LOGRAT.TXT")
  , package = "classComparison"), header = FALSE)
# Gene filter information, 1 - pass the filter, 0 - filtered
geneFilter <- scan(system.file("extdata", paste0(dataset, "_FILTER.TXT")
  , package = "classComparison"), quiet = TRUE)
# Class information
expDesign <- read.delim(system.file("extdata", paste0(dataset, "_EXPDESIGN.txt")
  , package = "classComparison"), as.is = TRUE)
# Pick the first column as the array IDs.
arrayIds <- expDesign[, 1]
exprData <- x
colnames(exprData) <- expDesign[, 1]
# Pick the 3rd column as the class variable.
nColumn = 3
ClassVariableName = gsub("[.]"," ",colnames(expDesign)[nColumn])
ClassVariableValues <- expDesign[, nColumn]

exprData is a 3226*22 log ratio matrix with rows representing 3226 genes and columns representing 22 samples.

##         s1321      s1996       s1822      s1714       s1224      s1252      s1510
## 1 -1.39854932 -3.0817938 -2.73039293 -1.8744690 -2.28824496 -0.3453870 -1.4232113
## 2  0.39940688  0.2781018 -0.20113993 -0.5334322 -0.57929373 -0.2874397 -0.8826430
## 3 -0.02509096  0.4375801  0.10479617  0.9533499 -0.22050031  0.3532323 -0.6731896
## 4 -0.13006058 -0.8389376 -0.23562828  0.6195197  0.81221521 -0.4181434 -0.5250910
## 5 -0.10309340 -0.4340958  0.06756324  0.7655347 -0.09386685 -0.4181434  0.3841435
## ......

ClassVariableValues is a string vector with length 22 for samples.

##  [1] "Sporadic" "BRCA1"    "BRCA1"    "BRCA1"    "BRCA1"    "BRCA1"    "BRCA1"    "BRCA2"   
##  [9] "BRCA2"    "BRCA2"    "BRCA2"    "Sporadic" "Sporadic" "Sporadic" "Sporadic" "Sporadic"
## [17] "Sporadic" "BRCA1"    "BRCA2"    "BRCA2"    "BRCA2"    "BRCA2"   

Then run the Class Comparison calculations:

projectPath <- tempdir()
outputName = "ClassComparisonBrca"
singleChannel <- ifelse(dataset == "Pomeroy", TRUE, FALSE)
generateHTML = TRUE
resList <- classComparison(exprData=exprData,
                             geneIds=geneIds,
                             ClassVariableName=ClassVariableName,
                             ClassVariableValues=ClassVariableValues,
                             geneFilter=geneFilter,
                             IsSingleChannel=singleChannel,
                             projectPath=projectPath,
                             outputName=outputName,
                             generateHTML=generateHTML)

It returns the same list as shown in the Quick Start Section. For more details about classComparison, please type help("classComparison") in the R console.