英文内容搬运自:
https://bioinformaticsworkbook.org/introduction/dataTerminology.html
There are four common bases in DNA sequence, A denine, G uanine, C ytosine and T hymine. U racil is found in RNA in place of Thyamine
Image taken from wikipedia where more information about nucleotides can also be found.
A read is a string of bases represented by their one letter codes. Here is an example of a read that is 50 bases long. TTAACCTTGGTTTTGAACTTGAACACTTAGGGGATTGAAGATTCAACAACCCTAAAGCTTGGGGTAAAAC
A contig is the consensus sequence generated by aligning reads to themselves.
The last line is the consensus of the aligned reads. We call this consensus sequence a contig .
A scaffold is a set of contigs that have been ordered and oriented based on mate pair or long distance information.
contig NNNNNNNNNNNN gitnoc NNNNNNNN contig NNNNNNNN contig NNNN gitnoc
In the line above
再搜文章一些补充,有图就更好了:
Chromosomes are the largest DNA molecules in a cell.
Scaffolds can be ordered and oriented using a genetic map or Hi-C data into linkage groups or chromosomes.
The ultimate goal of a genome assembly project is to assemble reads into phased chromosomes that represent an actual individual.
Most chromosomal assemblies produced today are not phased or may represent multiple individuals.