This package is used for comparing two or more pre-defined classes. The classes to be compared are in a vector. The vector can be any set of numerical, character or character string data. If an entry for a particular sample is left blank in the vector, that sample will be omitted from the class comparison analysis.
The Class Comparison between Groups of Arrays package computes a t-test or F-test separately for each gene using the normalized log-ratios for cDNA arrays and the normalized log-intensities for one color oligonucleotide arrays. The F-test is a generalization of the two-sample t-test for comparing values among groups. It has the option of using the random variance version of the t-test or F-test. They provide for sharing information among genes of the within-class variance in log-ratios or log signals. The class comparison function computes the number of genes that are differentially expressed among the classes at the statistical significance level selected in the F-test and creates a gene list containing information about the significant genes.
It implements the Class Comparison between Groups of Arrays tool in BRB-ArrayTools.
This package provides test.classComparison
for a quick start of class comparison analysis over one of the built-in sample data (i.e., “Brca”, “Perou”, and “Pomeroy”).
library(classComparison)
res <- test.classComparison("Brca",outputName = "classComparison", generateHTML = TRUE)
names(res)
It outputs an HTML file (C:\Users\YourUserName\Documents\Brca\Output\classComparison\classComparison.html
) with class comparison results as well as a list res
including the following objects:
## [1] "classifierTable" "workPath" "outputPath"
Here we give simple explanation about each object in res
:
res$classifierTable
is a data frame with the performance of classifiers:## Parametric p-value FDR Geom mean of ratios in BRCA1
## 1 4.6e-06 0.00683 0.26
## 2 6.8e-06 0.00683 1.38
## 3 4.88e-05 0.0155 1.35
## 4 5.12e-05 0.0155 1.17
## 5 5.52e-05 0.0155 0.64
## 6 5.66e-05 0.0155 1.3
## 7 6.96e-05 0.0155 1.26
## 8 7.18e-05 0.0155 0.47
## 9 7.86e-05 0.0155 1.34
## 10 8.36e-05 0.0155 2.33
## 11 8.62e-05 0.0155 0.54
## ......
## Geom mean of ratios in BRCA2 Geom mean of ratios in Sporadic UniqueID CloneID
## 1 1.1 0.71 HV17G6 897781
## 2 3.47 1.28 HV8D9 50413
## 3 4.25 1.35 LO5C8 345645
## 4 3.57 1.99 LO1A7 82991
## 5 2.19 1.07 HV25G2 838568
## 6 3.07 1.06 HV7B9 666377
## 7 0.57 1.11 UG6G12 51209
## 8 1.52 1.24 HV5H10 823940
## 9 0.56 0.99 HV19C12 784830
## 10 5.57 2.1 HV12G4 752631
## 11 1.24 0.58 HV2H5 244307
## ......
## Symbol
## 1 KRT8
## 2 COMT
## 3
## 4 ENPP1
## 5 COX6C
## 6 VEZF1
## 7 PPP1CB
## 8
## 9 CDC123
## 10
## 11 SERPINE1
## ......
## Name
## 1 Keratin 8
## 2 Catechol-O-methyltransferase
## 3 Transcribed locus, weakly similar to XP_002453697.1 hypothetical protein SORBIDRAFT_04g010730 [Sorghum bicolor]
## 4 Ectonucleotide pyrophosphatase/phosphodiesterase 1
## 5 Cytochrome c oxidase subunit VIc
## 6 Vascular endothelial zinc finger 1
## 7 Protein phosphatase 1, catalytic subunit, beta isozyme
## 8 Homo sapiens, clone IMAGE:4133978, mRNA
## 9 Cell division cycle 123 homolog (S. cerevisiae)
## 10 Transcribed locus
## 11 Serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1
## ......
## EntrezID Pairwise significant
## 1 3856 (1, 2), (1, 3)
## 2 1312 (1, 2), (3, 2)
## 3 (1, 2), (3, 2)
## 4 5167 (1, 2)
## 5 1345 (1, 2), (3, 2)
## 6 7716 (1, 2), (3, 2)
## 7 5500 (2, 1), (2, 3)
## 8 (1, 2), (1, 3)
## 9 8872 (2, 1), (2, 3)
## 10 (1, 2), (3, 2)
## 11 5054 (1, 2), (3, 2)
## ......
res$workPath
is the path for Fortran and other intermediate output files.
res$outputPath
is the path for final result output files.
classComparison
is the main R function to perform class comparison analysis. In this section, we will look into details about how to prepare inputs for classComparison
. Once again, we use the “Brca” sample data for an example. The package contains the following “Brca” sample information:
*Brca_LOGRAT.txt : a table of expression data with rows representing genes and columns representing samples;
*Brca_FILTER.TXT: a list of filtering information, where 1 means the corresponding gene passes the filters while 0 means it is excluded from analysis;
*Brca_GENEID.txt: a table of gene information corresponding to row information of Brca_LOGRAT.txt and Brca_FILTER.TXT;
*Brca_EXPDESIGN.txt: a table with class information AND/OR separate test set information.
There are a total of 22 samples in 3 classes for class comparison calculations. We run the following code to obtain objects like exprData
as inputs to classComparison
.
dataset<-"Brca"
# Gene IDs
geneIds <- read.delim(system.file("extdata", paste0(dataset, "_GENEID.txt"), package = "classComparison"), as.is = TRUE, colClasses = "character")
# Expression data, and here are log ratio.
x <- read.delim(system.file("extdata", paste0(dataset, "_LOGRAT.TXT")
, package = "classComparison"), header = FALSE)
# Gene filter information, 1 - pass the filter, 0 - filtered
geneFilter <- scan(system.file("extdata", paste0(dataset, "_FILTER.TXT")
, package = "classComparison"), quiet = TRUE)
# Class information
expDesign <- read.delim(system.file("extdata", paste0(dataset, "_EXPDESIGN.txt")
, package = "classComparison"), as.is = TRUE)
# Pick the first column as the array IDs.
arrayIds <- expDesign[, 1]
exprData <- x
colnames(exprData) <- expDesign[, 1]
# Pick the 3rd column as the class variable.
nColumn = 3
ClassVariableName = gsub("[.]"," ",colnames(expDesign)[nColumn])
ClassVariableValues <- expDesign[, nColumn]
exprData
is a 3226*22 log ratio matrix with rows representing 3226 genes and columns representing 22 samples.
## s1321 s1996 s1822 s1714 s1224 s1252 s1510
## 1 -1.39854932 -3.0817938 -2.73039293 -1.8744690 -2.28824496 -0.3453870 -1.4232113
## 2 0.39940688 0.2781018 -0.20113993 -0.5334322 -0.57929373 -0.2874397 -0.8826430
## 3 -0.02509096 0.4375801 0.10479617 0.9533499 -0.22050031 0.3532323 -0.6731896
## 4 -0.13006058 -0.8389376 -0.23562828 0.6195197 0.81221521 -0.4181434 -0.5250910
## 5 -0.10309340 -0.4340958 0.06756324 0.7655347 -0.09386685 -0.4181434 0.3841435
## ......
ClassVariableValues
is a string vector with length 22 for samples.
## [1] "Sporadic" "BRCA1" "BRCA1" "BRCA1" "BRCA1" "BRCA1" "BRCA1" "BRCA2"
## [9] "BRCA2" "BRCA2" "BRCA2" "Sporadic" "Sporadic" "Sporadic" "Sporadic" "Sporadic"
## [17] "Sporadic" "BRCA1" "BRCA2" "BRCA2" "BRCA2" "BRCA2"
Then run the Class Comparison calculations:
projectPath <- tempdir()
outputName = "ClassComparisonBrca"
singleChannel <- ifelse(dataset == "Pomeroy", TRUE, FALSE)
generateHTML = TRUE
resList <- classComparison(exprData=exprData,
geneIds=geneIds,
ClassVariableName=ClassVariableName,
ClassVariableValues=ClassVariableValues,
geneFilter=geneFilter,
IsSingleChannel=singleChannel,
projectPath=projectPath,
outputName=outputName,
generateHTML=generateHTML)
It returns the same list as shown in the Quick Start Section. For more details about classComparison
, please type help("classComparison")
in the R console.