|
PennCNV trio-based CNV calling Suppose we already generated the individual-based CNV calls for sample1.txt, sample2.txt and sample3.txt, as described in the previous tutorial section in CNV Calling. Below is a description of the procedure for trio-based (and quartet-based) CNV calling for each of the three individuals using PennCNV. [kai@adenine penncnv]$ detect_cnv.pl -trio -hmm lib/hh550.hmm -pfb lib/hh550.hg18.pfb -cnv sampleall.rawcnv sample1.txt sample2.txt sample3.txt -out sampleall.triocnv In the above command, the --trio argument specify that we want to use family-based CNV detection algorithm to jointly update CNV status for a father-mother-offspring trio. The --cnvfile argument specify the prior CNV calls generated in individual-based calling step. The three files in command line represent signal data for father, mother and offspring, respectively. The output will be redirected and written to sampleall.cnv. Note that we can also generate a listfile, which contains 3 file names per line, to process multiple trios simultaneously. The PennCNV trio-based calling algorithm analyzes the fifth column (file name column) of each line in the sampleall.rawcnv file, and then checks all individual-based CNV calls generated on any member of a trio (sample1.txt, sample2.txt and sample3.txt), and then try to re-call these regions on the trio and fine map boundaries. Therefore, it is important that the file names listed in the sampleall.rawcnv file is identical to the names in the command line, otherwise the program won’t work, since it cannot figure out the correct individual-based calls to use. The first a few lines of the output file is listed below: [kai@adenine penncnv]$ cat sampleall.triocnv After using family information, we now generate a total of 62 CNVs for three members in this family. Among the 21 CNVs in offspring (sample3.txt), 20 are inherited CNVs and 1 is de novo CNV. All inherited CNVs have identical boundaries as the corresponding CNVs from the father or mother. The de novo CNV is this one: chr3:3974670-4071644 numsnp=50 length=96,975 state2 sample3.txt startsnp=rs11716390 endsnp=rs17039742 offspring triostate=332 As we can see, the new CNV file contains two extra fields: the eighth field indicates that sample3.txt is offspring in the trio-based CNV calling, while the ninth field tells us that the HMM states for the trio are 3 (normal), 3 (normal) and 2 (one-copy deletion) at this genomic region, respectively. If the family has two children, then the “-quartet” argument can be used for CNV calling. Accordingly, four file names should be supplied in the command line, or given in each line of the list file, representing father, mother, child 1 and child 2, respectively. PennCNV cannot generate calls on a pair of parents and 3 or more children; instead, the user need to split the family into trios and quartets for CNV calling, and then combine the CNV calls together into consensus calls. |