This package creates a multivariate predictor for determining to which of multiple classes a given sample belongs. Several multivariate classification methods are available, including the Compound Covariate Predictor, Diagonal Linear Discriminant Analysis, Nearest Neighbor Predictor, Nearest Centroid Predictor, and Support Vector Machine Predictor. For all class prediction methods requested, this package provides an estimate of how accurately the classes can be predicted by this multivariate class predictor. The whole procedure is evaluated by the cross-validation methods including leave-one-out cross-validation, k-fold validation and 0.632+ bootstrap validation. The cross-validated estimate of misclassification rate is computed and performance of each classifier is provided. New samples can be further classified based on specified classifiers and the multivariate predictor from full dataset.

Quick Start

This package provides test.classPrediction for a quick start of class prediction analysis over one of the built-in sample data (i.e., “Brca”, “Perou”, and “Pomeroy”).

library(classpredict)
res <- test.classPredict("Brca",outputName = "ClassPrediction", generateHTML = TRUE)
names(res)

It outputs an HTML file (C:\Users\YourUserName\Documents\Brca\Output\ClassPrediction\ClassPrediction.html) with class prediction results as well as a list res including the following objects:

## [1] "performClass"        "percentCorrectClass" "predNewSamples"      "classifierTable"    
## [5] "probInClass"         "CCPSenSpec"          "LDASenSpec"          "K1NNSenSpec"        
## [9] "K3NNSenSpec"         "CentroidSenSpec"     "SVMSenSpec"          "BCPPSenSpec"        
##[13] "probNew"             "weightLinearPred"    "thresholdLinearPred" "GRPCentroid"         
##[17] "pmethod"             "workPath" 

Here we give simple explanation about each object in res:

##    Array id Class label Mean Number of genes in classifier CCP Correct? DLDA Correct? 1NN Correct?
## 1     s1996       BRCA1                                 16          YES           YES          YES
## 2     s1822       BRCA1                                 20          YES           YES          YES
## 3     s1714       BRCA1                                 28          YES           YES          YES
## 4     s1224       BRCA1                                 15          YES           YES          YES
## 5     s1252       BRCA1                                 28          YES            NO          YES
## 6     s1510       BRCA1                                 20          YES           YES          YES
## 7     s1905       BRCA1                                 20          YES           YES          YES
## 8     s1900       BRCA2                                 13          YES           YES          YES
## 9     s1787       BRCA2                                 17          YES           YES          YES
## 10    s1721       BRCA2                                 10          YES           YES          YES
## 11    s1486       BRCA2                                 17           NO            NO          YES
##    3NN Correct? Nearest Centroid Correct? SVM Correct? BCCP Correct?
## 1           YES                       YES          YES           YES
## 2           YES                       YES          YES           YES
## 3           YES                       YES          YES           YES
## 4           YES                       YES          YES           YES
## 5            NO                       YES          YES           YES
## 6           YES                       YES          YES           YES
## 7           YES                       YES          YES           YES
## 8            NO                        NO          YES           YES
## 9           YES                       YES          YES           YES
## 10          YES                       YES          YES           YES
## 11           NO                        NO           NO            NO
##   CCP Correct? DLDA Correct? 1NN Correct? 3NN Correct? Nearest Centroid Correct? SVM Correct? BCCP Correct?
## 1           91            82          100           73                        82           91            91
##   ExpID TrueClass   CCP   LDA    K1    K3 Centroid   SVM  BCCP
## 1 s1816   predict BRCA2 BRCA2 BRCA2 BRCA2    BRCA2 BRCA2 BRCA2
## 2 s1616   predict BRCA2 BRCA1 BRCA2 BRCA1    BRCA2 BRCA2    NC
## 3 s1063   predict BRCA1 BRCA1 BRCA1 BRCA1    BRCA1 BRCA1 BRCA1
## 4 s1936   predict BRCA2 BRCA2 BRCA2 BRCA2    BRCA2 BRCA2 BRCA2
  Array id Class Probability
1    s1816 BRCA1  p < 1.0e-3
2    s1616 BRCA1       0.344
3    s1063 BRCA1           1
4    s1936 BRCA1  p < 1.0e-3

Cross-validation ROC curves are provided for Compound Covariate Predictor, Diagonal Linear Discriminant Analysis and Bayesian Compound Covariate Classifiers.

library(classpredict)
res <- test.classPredict("Brca",outputName = "ClassPrediction")
## Getting analysis results ...
plotROCCurve(res,"ccp")
plotROCCurve(res,"dlda")
plotROCCurve(res,"bcc")

plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1

Data Input

classPredict is the main R function to perform class prediction analysis. In this section, we will look into details about how to prepare inputs for classPredict. Once again, we use the “Brca” sample data for an example. The package contains the following “Brca” sample information:

*Brca_LOGRAT.txt : a table of expression data with rows representing genes and columns representing samples;

*Brca_FILTER.TXT: a list of filtering information, where 1 means the corresponding gene passes the filters while 0 means it is excluded from analysis;

*Brca_GENEID.txt: a table of gene information corresponding to row information of Brca_LOGRAT.txt and Brca_FILTER.TXT;

*Brca_EXPDESIGN.txt: a table with class information AND/OR separate test set information.

There are a total of 15 samples, where 11 samples will used as training data and the remaining are new samples for class prediction. We run the following code to obtain objects like exprTrain and exprTest as inputs to classPredict.

dataset<-"Brca"
# gene IDs
geneId <- read.delim(system.file("extdata", paste0(dataset, "_GENEID.txt"), package = "classpredict"), as.is = TRUE, colClasses = "character") 
# expression data
x <- read.delim(system.file("extdata", paste0(dataset, "_LOGRAT.TXT"), package = "classpredict"), header = FALSE)
# filter information, 1 - pass the filter, 0 - filtered
filter <- scan(system.file("extdata", paste0(dataset, "_FILTER.TXT"), package = "classpredict"), quiet = TRUE)
# class information
expdesign <- read.delim(system.file("extdata", paste0(dataset, "_EXPDESIGN.txt"), package = "classpredict"), as.is = TRUE)
# training/test information
testSet <- expdesign[, 10]
trainingInd <- which(testSet == "training")
predictInd <- which(testSet == "predict")
ind1 <- which(expdesign[trainingInd, 4] == "BRCA1")
ind2 <- which(expdesign[trainingInd, 4] == "BRCA2")
ind <- c(ind1, ind2)
exprTrain <- x[, ind]
colnames(exprTrain) <- expdesign[ind, 1]
exprTest <- x[, predictInd]
colnames(exprTest) <- expdesign[predictInd, 1]

exprTrain is a 3226*11 matrix with rows representing genes and columns representing 11 training samples.

##            s1996        s1822       s1714       s1224        s1252       s1510       s1905       s1900       s1787        s1721        s1486
## 1    -3.08179379 -2.730392933 -1.87446904 -2.28824496 -0.345387012 -1.42321134 -1.68280995 -1.77760768 -0.24100803 -0.29195589  0.241469175
## 2     0.27810180 -0.201139927 -0.53343225 -0.57929373 -0.287439704 -0.88264304 -1.00000000 -0.41503757 -1.02236784 -0.74802077 -1.166995645
## 3     0.43758011  0.104796171  0.95334989 -0.22050031  0.353232265 -0.67318958  0.99407518  0.51096189 -0.16438681 0.02185956  0.241469175
## 4    -0.83893764 -0.235628277  0.61951971  0.81221521 -0.418143421 -0.52509099  0.76970232  0.26303440  0.64296818  1.45843005 -0.041464776
## 5    -0.43409583  0.067563236  0.76553470 -0.09386685 -0.418143421  0.38414347 -0.27252591 -0.19264519 -0.51457310 -0.62403196 -0.017618060
## ......

exprTest is a 3226*4 matrix with the expressions of four new samples.

##        s1816      s1616      s1063       s1936
## 1 -0.8214026 -0.5618789 -0.4611339 -0.93288577
## 2 -0.8614801 -1.6322682 -0.7737241 -0.33342373
## 3  0.4066253  0.4381211  0.4116309  1.25153875
## 4  1.3286228  1.3737305  0.5574818  1.02272010
## 5  1.3330686  1.2422009  0.0402640  0.03394729

The following procedure develops from all samples seven classifiers which are used to predict classes of new samples. Individual genes that are used by classifiers are selected at the 0.001 significance level. Random variance model will be used for univariate tests. The leave-one-out cross-validation method is employed to evaluate class prediction accuracy by selecting predictors and training classifiers from cross-validated traning set and calculating the cross-validated estimate of misclassification error over the cross-validated test set. Equal prior probabilities are assumed for the Bayesian Compound Covariate Predictor.

projectPath = tempdir()
outputName = "classPredictionBrca"
generateHTML = TRUE
prevalence <- c(length(ind1)/(length(ind1)+length(ind2)),length(ind2)/(length(ind1)+length(ind2)))
names(prevalence) <- c("BRCA1", "BRCA2")
resList <- classPredict(exprTrain = exprTrain, exprTest = exprTest, isPaired = FALSE, 
                        pairVar.train = NULL, pairVar.test = NULL, geneId,
                        cls = c(rep("BRCA1", length(ind1)), rep("BRCA2", length(ind2))),
                        pmethod = c("ccp", "bcc", "dlda", "knn", "nc", "svm"), 
                        geneSelect = "igenes.univAlpha",
                        univAlpha = 0.001, univMcr = 0, foldDiff = 0, rvm = TRUE, filter = filter, 
                        ngenePairs = 25, nfrvm = 10, cvMethod = 1, kfoldValue = 10, bccPrior = 1, 
                        bccThresh = 0.8, nperm = 0, svmCost = 1, svmWeight =1, fixseed = 1, 
                        prevalence = prevalence, projectPath = projectPath, 
                        outputName = outputName, generateHTML = generateHTML)
if (generateHTML)
  browseURL(file.path(projectPath, "Output", outputName,
            paste0(outputName, ".html")))

It returns the same list as shown in the Quick Start Section. For more details about classPredict, please type help("classPredict") in the R console.