To Find Group-Similarity Segment in Biological Sequences

Wang, Tsai Cheng

Group-Similarity Segment (GSS) represents the subsequence concentrated by a group of local alignments. Given a set of local alignments focused on a sequence, each element in this sequence earns its score therefore forms a string of values. The Maximum-Sum method [1] is applied here to find a maximum-sum region in a numerical string in linear time. In this case, by masking the found NA, the next maximum-sum one turns to be an immediate candidate. This process is terminated once the found GSS reaches the threshold.

Figure 1. illustrates an example of the GSS analysis. The bottom half chart shows the local alignments found by my protein program corresponding to the positions of the query sequence. The upper half chart presents the score distribution for all elements in the query sequence. The GSS analysis finds four segments

  1. [554, 699], Local Mean = 52.135518;

  2. [179, 261], Local Mean = 52.832498;

  3. [408, 445], Local Mean = 44.925713;

  4. [31, 61], Local Mean = 33.497869;

that all float above mean ( 27.820466 ).



Figure 1. The analysis of GSS for gi|118907|sp|P22346|PEPX_LACLC Xaa-Pro dipeptidyl-peptidas (X-Pro dipeptidyl-peptidase) (X-prolyl-dipeptidyl aminopeptidase) (X-PDAP) vs. the protein database of Homo Sapiens from NCBI


View two other results from aligning against the protein database of Homo Sapiens from NCBI.

  1. View gi|1345731|sp|P49454|CENF_HUMAN CENP-F kinetochore protein (Centromere protein F) (Mitosin) (AH antigen)

  2. View gi|401055|sp|P30957|RYR2_RABIT Ryanodine receptor 2 (Cardiac muscle-type ryanodine receptor) (RyR2) (RYR-2) (Cardiac muscle ryanodine receptor-calcium release channel)


Reference:

  1. T.-H. Fan, S. Lee, T.-S. Tsou, T.-C. Wang, and A. Yao , “An Optimal Algorithm for Maximum-Sum Segment and Its Application in Bioinformatics”, Proceedings of the 8th International Conference on Implementation and Application of Automata (CIAA 2003), Santa Barbara, CA, USA, July 16-18, 2003, LNCS 2759, pp. 251-257.