|
PennCNV: Quick start-up guide with examples Starting from the November 2008 version of PennCNV, several example data sets and scripts have been included to test that the programs are installed correctly, and to demonstrate some quick examples to use the various programs in PennCNV. In the penncnv/example/ directory, we will see several files there. Among them, the father.txt, mother.txt and offspring.txt are three signal intensive files with signal values (to keep the file size small, only a few chromosomes are included in these files). In addition, there are several list files that contain file names to be processed by PennCNV. [kaiwang@cc ~/project/penncnv/example]$ ./runex.pl Optional arguments: Function: test-drive PennCNV and related scripts in your system Example: runex.pl 1 (run PennCNV to call CNV on three signal files)
The user can try to run these examples one by one and get some idea on what PennCNV can do and how to use the command line options. For example, let’s first try the first example: [kaiwang@cc ~/project/penncnv/example]$ runex.pl 1 ****************************************************************************** ****************************************************************************** Note: when running the examples, the program suppose that the PennCNV executables are already in your system path. If not, then you can use"runex.pl 1 -path_detect_cnv ../detect_cnv.pl" instead to specify the path to the executable (the executable is located at the upper directory). In the above command, the program gives the first example, which use individual-based CNV calling algorithm on three input files (father.txt, mother.txt and offspring.txt), and then write the output to the ex1.rawcnv file. The actual command line arguments are printed after “Running command”. Some LOG information was printed out in the screen between the two ******* lines, but they are also written to the ex1.log file as well. We can check the content of the ex1.rawcnv file: [kaiwang@cc ~/project/penncnv/example]$ cat ex1.rawcnv These fields are chromosome coordinates, number of markers (SNPs markers and sometimes CN markers as well) in the region, the CNV length, the copy number estimate, the signal file name, the start and end SNP and the confidence score. See more in-depth description on the default CNV calling algorithm here. The example 2 illustrates trio-based calling algorithm, which requires the output file from example 1 (ex1.rawcnv) as one of the input files. The example 6 illustrates a joint-calling algorithm for trios that uses one step only, and does not require the ex1.rawcnv as input files. See more in-depth description on the trio-based algorithm here. The example 3 illustrates the use of GC-model adjustement of signal intensity values for CNV calling. The algorithm was previously published and described here. The signal values for the 3 files are not really affected by genomic waves, so the adjustment has little effects on CNV calls. The example 4 and 5 illustrate the validation-calling algorithm. This is not based on HMM, but instead based on a validation subroutine that takes prior probability, calculate likelihood of the region being various copy numbers, and then select the most likely copy number. See more in-depth description on the validation-calling algorithm here. The example 6 illustrates the joint-calling algorithm for trios. This is a HMM-based algorithm that simultaneously model the signal measures for a trio (the detailed algorithm is described in this paper), so it is computationally expensive and may take quite some time to run. The example 7, 8 and 9 illustrate the use of convert_cnv.pl program to convert CNV call formats. The example 10 illustrates the use of filter_cnv.pl program to select a subset of CNV calls. The example 11 and 12 illustrate the use of compare_cnv.pl program to compare CNV calls on same file given by different algorithms, or on duplicated samples given by the same algorithm. The example 13 illustrates the use of infer_snp_allele.pl program to infer CNV-based genotype calls in CNV regions for three subjects. See more in-depth description on the program here. The example 14 illustrates the use of infer_snp_allele.pl program to validate putative de novo CNV calls and assign P-values. See more in-depth description on the program here. The example 15 illustrates the use of convert_cnv.pl program to convert CNV calls from other algorithms to penncnv output format. The resulting calls are useful for comparative analysis of calls between algorithms, or can be used in visualize_cnv.pl program to plot the actual signal intensity values. The example 16 illustrates the use of visualize_cnv.pl program to plot signal intensity values (LRR and BAF) for all CNV calls for a given individual, so that users can visually examine and decide whether the calls are reliable or not (without relying on manual examination of GenomeStudio). The program requires calling R subroutines to work. Note that the output_expected/ directory under the example/ directory contains the expected output files. If you see a difference between your output and the expected output, then maybe there is a problem with the PennCNV installation and a re-compilation is necessary.
|