Protein Sequence Alignment and Analysis

Amino acid sequence alignment andanalysis is centralto most biochemical and molecular biology applications. Although it shouldbepossible to retrieve all the information we need about a protein directly from its sequence, looking at a sequence without prior knowledge and experience is like reading a text in a foreign language: we may recognize the letters, but we do not understand the meaning and are unable to extract the information. Still, when proteins are concerned, we have learned to extract a substantial part of the information from detailed sequence analysis, using for example multiple sequence alignment. In a multiplesequencealignment a given sequence is compared to a group of evolutionary related sequences from other organisms. The pleasant fact is that we will always find a relatedprotein from some other organism. When we say "related" we mean that they belong to thesame family, the members of which usually perform a similar function in different organisms. We know that in such cases the main characteristic features of a protein sequence and the protein tertiary structure are conserved. Since conservation of function assumes that a certain number of amino acid residues within a protein family are conserved, weneedto have some instruments to assess the degreeof conservation of each sequence. To assist in the process, alignment techniques and scoring schemes for sequence alignment have been developed. Here I will discuss the basic concepts behindthese techniques and willprovide two examples to guide you in makingsequence alignment using resources available on the Internet. Since we focus here on structuralbioinformatics, the alignments we make will be interpreted in terms of the three-dimensionalstructure. We will also discuss what structural information may be identifiedin a sequence alignment, how to relate sequence and structuralinformation and how to make use of available structural data to make better sequence alignment.

When making a sequence alignment we need to take into account several factors. For example, we needto understand the effect of replacements of one amino acidby another (amino acid substitutions) in different sequences. Thus, some substitutions are conservative, i.e., they will not introduce any substantial disturbances in the protein structure, while others may have dramatic effect on thestructure and function of the protein in question andthey are normally rather rare. To account for the different types of substitutions, there are specially designed so called substitution matrices, which can be used for makinga correct alignment and for calculating the score of the alignment. Structural information may also be used to assist us in making a correct alignment, for example in understanding the effect of amino acid substitutions.

There will also be two guided examples, which will make use of the sequence alignment and analysis resources available at the Expasy server. In some cases the alignment may be easy to make, while in others, for example when we align multidomain proteins, or when there is a large number of insertions and deletions, extra attention is required. Structuralinformation, and particularly protein secondary structure, may provide valuable insights into the effects of various replacements, insertions and deletions. The results form these tutorials will be used later in homology modeling, which will follow in the chapter dedicated to modeling.

0 Comments:

Post a Comment