![]() ![]() Scaffolding was facilitated using a directed graph containing scaffolds longer than 200 bp as nodes, and edges were based on the PE and MP links as vertices. The mapping of the single read ends (mapping without gaps) and long-read builds facilitated scaffolding by linking contigs mapping to the same read. To scaffold the contigs of the De Bruijn graph, non-repetitive contigs within the graph were identified and assembled into scaffolds based on mapping information of the single end reads, followed by application of synthetic long-reads to resolve long repeats in a similar way. A kmer of size 63 bp was optimal for De Bruijn graph construction. In the first step of assembly, SOAPdenovo v1.05 (Luo et al., 2012) was used to construct a De Bruijn graph of contigs from the single end reads of the PE library using very conservative settings (no bubble merge and no repeat masking, but removing low coverage kmers and edges). The TruSeq synthetic long-read assembly pipeline application was processed by Illumina (Illumina IGN FastTrack Long Reads version 41 Illumina, San Diego, CA) to create synthetic long-read builds that represent long contiguous template fragments. Mate-pairs that did not hit the linker were used only in support of links found with the filtered MPs, but were not used to create links independently. Mate pairs for which the linker was not found were sorted into a separate file for restricted scaffolding application. Processing of MP reads consisted of filtering out putative false mate-pairs by searching for the Nextera linker (10 nucleotides of CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG) sequence on either end of the MP. For the PCR-free library (MiSeq stitched reads), following adaptor truncation, overlapping reads were merged using FLASH (Magoc and Salzberg, 2011) with a minimal required overlap of 10 bp to create the stitched reads. The Illumina HiSeq2000 adaptor AGATCGGAAGAGC was removed, and reads were error corrected using the Corrector_HA module of SOAPdenovo (using kmer size 23 and cutoff of 6) (Luo et al., 2012). For all PH207 genomic libraries, with the exception of TruSeq synthetic long-reads, PCR duplicates were removed using FastUniq software (Xu et al., 2012). Sequencing method: Illumina HiSeq ChemistryĪssembly methods: Reads pre-processing and error correction. Sequencing technologies: Whole-genome shotgun sequencing using paired-end, mate-pair, and TruSeq synthetic long-reads MikelĭOE Great Lakes Bioenergy Research Center (DOE BER Office of Science DE-FC02-07ER64494), by Dow AgroSciences, and by the National Science Foundation (Grant IOS-1126998 to K.L.C.).ĭe novo assembly of elite inbred line PH207 provides insights into genomic and transcriptomic diversity in maize (Zea mays L.). Bowman, Ilya Soifer, Omer Barad, Doron Shem-Tov, Kobi Baruch, Fei Lu, Alvaro G. To learn about maize genome and gene model nomenclature rules.Ĭandice N. Information about assembly Zm-PH207-REFERENCE_NS-UIUC_UMN-1.0
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |