This package is used for comparing two or more pre-defined classes. The classes to be compared are in a vector. The vector can be any set of numerical, character or character string data. If an entry for a particular sample is left blank in the vector, that sample will be omitted from the class comparison analysis.
The Class Comparison between Groups of Arrays package computes a t-test or F-test separately for each gene using the normalized log-ratios for cDNA arrays and the normalized log-intensities for one color oligonucleotide arrays. The F-test is a generalization of the two-sample t-test for comparing values among groups. It has the option of using the random variance version of the t-test or F-test. They provide for sharing information among genes of the within-class variance in log-ratios or log signals. The class comparison function computes the number of genes that are differentially expressed among the classes at the statistical significance level selected in the F-test and creates a gene list containing information about the significant genes.
It implements the Class Comparison between Groups of Arrays tool in BRB-ArrayTools.
This package provides test.classComparison for a quick start of class comparison analysis over one of the built-in sample data (i.e., “Brca”, “Perou”, and “Pomeroy”).
library(classComparison)
res <- test.classComparison("Brca",outputName = "classComparison", generateHTML = TRUE)
names(res)
It outputs an HTML file (C:\Users\YourUserName\Documents\Brca\Output\classComparison\classComparison.html) with class comparison results as well as a list res including the following objects:
## [1] "classifierTable" "workPath" "outputPath"
Here we give simple explanation about each object in res:
res$classifierTable is a data frame with the performance of classifiers:## Parametric p-value FDR Geom mean of ratios in BRCA1
## 1 4.6e-06 0.00683 0.26
## 2 6.8e-06 0.00683 1.38
## 3 4.88e-05 0.0155 1.35
## 4 5.12e-05 0.0155 1.17
## 5 5.52e-05 0.0155 0.64
## 6 5.66e-05 0.0155 1.3
## 7 6.96e-05 0.0155 1.26
## 8 7.18e-05 0.0155 0.47
## 9 7.86e-05 0.0155 1.34
## 10 8.36e-05 0.0155 2.33
## 11 8.62e-05 0.0155 0.54
## ......
## Geom mean of ratios in BRCA2 Geom mean of ratios in Sporadic UniqueID CloneID
## 1 1.1 0.71 HV17G6 897781
## 2 3.47 1.28 HV8D9 50413
## 3 4.25 1.35 LO5C8 345645
## 4 3.57 1.99 LO1A7 82991
## 5 2.19 1.07 HV25G2 838568
## 6 3.07 1.06 HV7B9 666377
## 7 0.57 1.11 UG6G12 51209
## 8 1.52 1.24 HV5H10 823940
## 9 0.56 0.99 HV19C12 784830
## 10 5.57 2.1 HV12G4 752631
## 11 1.24 0.58 HV2H5 244307
## ......
## Symbol
## 1 KRT8
## 2 COMT
## 3
## 4 ENPP1
## 5 COX6C
## 6 VEZF1
## 7 PPP1CB
## 8
## 9 CDC123
## 10
## 11 SERPINE1
## ......
## Name
## 1 Keratin 8
## 2 Catechol-O-methyltransferase
## 3 Transcribed locus, weakly similar to XP_002453697.1 hypothetical protein SORBIDRAFT_04g010730 [Sorghum bicolor]
## 4 Ectonucleotide pyrophosphatase/phosphodiesterase 1
## 5 Cytochrome c oxidase subunit VIc
## 6 Vascular endothelial zinc finger 1
## 7 Protein phosphatase 1, catalytic subunit, beta isozyme
## 8 Homo sapiens, clone IMAGE:4133978, mRNA
## 9 Cell division cycle 123 homolog (S. cerevisiae)
## 10 Transcribed locus
## 11 Serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1
## ......
## EntrezID Pairwise significant
## 1 3856 (1, 2), (1, 3)
## 2 1312 (1, 2), (3, 2)
## 3 (1, 2), (3, 2)
## 4 5167 (1, 2)
## 5 1345 (1, 2), (3, 2)
## 6 7716 (1, 2), (3, 2)
## 7 5500 (2, 1), (2, 3)
## 8 (1, 2), (1, 3)
## 9 8872 (2, 1), (2, 3)
## 10 (1, 2), (3, 2)
## 11 5054 (1, 2), (3, 2)
## ......
res$workPath is the path for Fortran and other intermediate output files.
res$outputPath is the path for final result output files.
classComparison is the main R function to perform class comparison analysis. In this section, we will look into details about how to prepare inputs for classComparison. Once again, we use the “Brca” sample data for an example. The package contains the following “Brca” sample information:
*Brca_LOGRAT.txt : a table of expression data with rows representing genes and columns representing samples;
*Brca_FILTER.TXT: a list of filtering information, where 1 means the corresponding gene passes the filters while 0 means it is excluded from analysis;
*Brca_GENEID.txt: a table of gene information corresponding to row information of Brca_LOGRAT.txt and Brca_FILTER.TXT;
*Brca_EXPDESIGN.txt: a table with class information AND/OR separate test set information.
There are a total of 22 samples in 3 classes for class comparison calculations. We run the following code to obtain objects like exprData as inputs to classComparison.
dataset<-"Brca"
# Gene IDs
geneIds <- read.delim(system.file("extdata", paste0(dataset, "_GENEID.txt"), package = "classComparison"), as.is = TRUE, colClasses = "character")
# Expression data, and here are log ratio.
x <- read.delim(system.file("extdata", paste0(dataset, "_LOGRAT.TXT")
, package = "classComparison"), header = FALSE)
# Gene filter information, 1 - pass the filter, 0 - filtered
geneFilter <- scan(system.file("extdata", paste0(dataset, "_FILTER.TXT")
, package = "classComparison"), quiet = TRUE)
# Class information
expDesign <- read.delim(system.file("extdata", paste0(dataset, "_EXPDESIGN.txt")
, package = "classComparison"), as.is = TRUE)
# Pick the first column as the array IDs.
arrayIds <- expDesign[, 1]
exprData <- x
colnames(exprData) <- expDesign[, 1]
# Pick the 3rd column as the class variable.
nColumn = 3
ClassVariableName = gsub("[.]"," ",colnames(expDesign)[nColumn])
ClassVariableValues <- expDesign[, nColumn]
exprData is a 3226*22 log ratio matrix with rows representing 3226 genes and columns representing 22 samples.
## s1321 s1996 s1822 s1714 s1224 s1252 s1510
## 1 -1.39854932 -3.0817938 -2.73039293 -1.8744690 -2.28824496 -0.3453870 -1.4232113
## 2 0.39940688 0.2781018 -0.20113993 -0.5334322 -0.57929373 -0.2874397 -0.8826430
## 3 -0.02509096 0.4375801 0.10479617 0.9533499 -0.22050031 0.3532323 -0.6731896
## 4 -0.13006058 -0.8389376 -0.23562828 0.6195197 0.81221521 -0.4181434 -0.5250910
## 5 -0.10309340 -0.4340958 0.06756324 0.7655347 -0.09386685 -0.4181434 0.3841435
## ......
ClassVariableValues is a string vector with length 22 for samples.
## [1] "Sporadic" "BRCA1" "BRCA1" "BRCA1" "BRCA1" "BRCA1" "BRCA1" "BRCA2"
## [9] "BRCA2" "BRCA2" "BRCA2" "Sporadic" "Sporadic" "Sporadic" "Sporadic" "Sporadic"
## [17] "Sporadic" "BRCA1" "BRCA2" "BRCA2" "BRCA2" "BRCA2"
Then run the Class Comparison calculations:
projectPath <- tempdir()
outputName = "ClassComparisonBrca"
singleChannel <- ifelse(dataset == "Pomeroy", TRUE, FALSE)
generateHTML = TRUE
resList <- classComparison(exprData=exprData,
geneIds=geneIds,
ClassVariableName=ClassVariableName,
ClassVariableValues=ClassVariableValues,
geneFilter=geneFilter,
IsSingleChannel=singleChannel,
projectPath=projectPath,
outputName=outputName,
generateHTML=generateHTML)
It returns the same list as shown in the Quick Start Section. For more details about classComparison, please type help("classComparison") in the R console.