Algorithms for Detection of Centromeric Satellite DNAs and Circular DNAs
by Aaron Li
Category: Biology
Abstract – Human centromeres are composed of long arrays of tandem repeats called satellite DNAs. Existing algorithms for identifying satellite repeats either require the complete assembly of satellite DNA sequences or only work for simple repeat structures. This paper describes an algorithm that that can extract and annotate centromeric satellite sequences even if the sequences are not fully assembled. Testing this algorithm using database CHM13 (T2T), which a manually-curated database, showed that this algorithm produced results that are highly correspondent to those produced by the T2T team. The implementation of this algorithm results in a faster process than the popular Smith-Waterman-based implementations.