What can LAMA do for me?
LAMA can identify protein families similar to your protein(s) of interest
and protein motifs similar to conserved regions in your protein(s). The
information known about these similar families and motifs can help you
identify the function and structure of your protein and locate critical
conserved regions in your protein(s). This can direct you in
designing experiments to test your hypotheses.
LAMA compares multiple sequence alignments of proteins. If you have only a single protein sequence you first need to find other members of its family. The protein sequences also need to be multiply aligned. The Content of input section explains how to find related sequences and align them.
How does LAMA align blocks?
The multiple alignments are first transformed into position specific
scoring matrices (PSSMs). Each column in
the PSSM corresponds to a position in the
alignment and has the amino acid distribution of that position. The
transformation into the PSSM is done with position-based sequence weights
(Henikoff & Henikoff, 1994a)
and odd ratios between the amino acid frequencies
observed in the multiple alignments and the frequencies expected
from protein databases
(Henikoff & Henikoff, 1995).
The transformation corrects possible overrepresentation of some
sequences by sequence weighting and considers the background
frequencies of the amino acids.
The method was tested and calibrated with ungapped local multiple alignments
(blocks) from the
Blocks Database
.
The matrices are treated as sequences of columns, enabling
their alignment with one another. To use algorithms developed for
aligning single sequences we need a measure for comparing pairs of
matrix columns. This corresponds to the substitution matrices
(PAM, BLOSUM etc.) used in single-sequence alignments. The
measure used in our method to score the similarity between pairs of
matrix columns is the Pearson correlation coefficient (r):
where A(i) and B(i) are the values of amino acid i in columns A and B,
respectively, and /A and /B
are the means of the values in columns A and B.
The correlation score ranges from 1 for columns with identical
amino acid distributions to -1 for columns with opposite
distributions (in each column only 10 amino acids occur and
those 10 amino acids are different in the two compared
columns).
The score of a block-to-block alignment is the sum of the scores from
comparing the corresponding columns in the two block matrices:
Local alignment of blocks. Positions 2 to 7 from block A aligned with positions 4 to 9 from block B. A column comparison score, s(Xn*Ym), is calculated for each pair of positions (A2*B4 to A7*B9). The score of the alignment of the two segments, S, is the sum of the column comparison scores.The alignment is done using the Smith-Waterman algorithm for optimal local alignments. No gaps are allowed since the aligned objects are short conserved sequence regions. All alignments above the cutoff score are reported for each pair of compared blocks. There may be cases where parts of one long block are similar to several blocks:
AAAAAAAAAAAAAAAAAAA BBB CCCCCC
If you only have a single protein sequence or want to find more protein sequences related to yours you can search the sequence databases. One way to do this on the WWW is using the BLAST program to search the NCBI sequence databases. Links to other search methods can be found at the Baylor College of Medicine Human Genome Center Search Launcher site.
The BlockMaker WWW site can be used for finding blocks in your group of related protein sequences. There are various other methods for making protein multiple sequence alignments. Among these are the MEME system, Gibbs sampling programs, the MACAW interactive program, and the CLUSTAL-W progressive multiple alignment program. Several of these methods are available through the multiple sequence alignment page at the Baylor College of Medicine Human Genome Center.
Multiple alignments submitted to the program should be of conserved, relatively ungapped, protein sequence regions. A few gaps in the alignment are acceptable. The more sequences are in the alignment the better. In general, avoid alignments with less than 4 sequences.
Format of input
LAMA only accepts input in the
Block format. Other multiple
alignments can be
reformatted to the Block format. If you are not sure of your
multiple alignment or just have a group of related
sequences you can use the
BlockMaker program for
finding blocks in the sequences. Note that to avoid biassed sequence
representation blocks include sequence weights.
The program version and execution parameters head the search output. Only alignments longer than the minimal length will be reported. The significance of very short alignments (fewer than 4 positions) cannot be reliably estimated. Alignments with scores equal or above the score cutoff will be reported. The score cutoff is specified as a Z score. Z score is the number of standard deviations between the score and the mean score. The mean score and the standard deviations were calculated for the random scores from the alignment of a large number of shuffled unbiassed blocks (7 million block pairs; see first supplement). The Z score is related to the percentile of the score in the shuffled blocks scores. This dependence is not linear but sigmoidal (see second supplement).
LAMA version 1.00 October 96. Minimal length of reported alignments 4 Score cutoff is 5.6 Z score units (in the top 7.7e-05 percentile of chance scores) alignment Z-score expected number for block 1 from:to block 2 from:to length searching 5000 blocks BL01063B 20 : 46 and BL00042B 3 : 29 (27) score 39 ( 7.2 1.3e-02) [![]()
?] BL01063B 5 : 39 and BL00324C 3 : 37 (35) score 27 ( 6.1 1.5e-01) [
![]()
?] BL01063B 12 : 47 and BL00622 8 : 43 (36) score 33 ( 8.2 0.0e+00) [
![]()
?] BL01063B 10 : 46 and BL00894A 1 : 37 (37) score 26 ( 5.7 3.2e-01) [
![]()
?] BL01063B 4 : 42 and BL01043A 2 : 40 (39) score 29 ( 8.1 0.0e+00) [
![]()
?]
When both query and target blocks are provided by the user the output can also contain the column scores of each reported alignment and the PSSMs of every compared block.
Pay attention to any error or warning messages. Most will probably have to do with the format of the input.
Evaluating LAMA alignment scores
The alignment score is the average of the
column scores in the alignment multiplied by 100.
Since the column scores have a range of -1 to 1 the alignment score
will range from -100 to 100. An alignment score of 46 means
that on average the aligned positions had a correlation coefficient
of 0.46. The significance of the alignment score depends on the
length of the compared blocks. Alignments between longer blocks
will tend to be longer and have higher scores.
The Z score and
expected number let us estimate the
significance of the scores
and to compare alignments of different lengths.
The higher the Z score the less likely the alignment is due
to chance. How unlikely depends on the number of blocks searched.
The more blocks searched the greater the probability to find chance
high scores. For example, the output of the calibration with the
shuffled blocks contained 7 million
scores but no alignments with Z scores greater than 8.3 .
Hence an alignment with a score equal or higher than that Z score
is unlikely by chance in a comparable or smaller number
of alignments. The expected number shows this directly.
The expected number is shown for searching 5000 blocks since version 9.1 of the
Blocks Database
contains 3300 blocks. For example, searching this release of
the Blocks Database and finding an alignment expected to appear
1.8e-01 times (0.18) suggests that this alignment is not due to chance.
Alignments with expected occurrences of 7.5e-03 or even 0 are almost
certainly genuine (or due to biassed blocks,
see below).
A relation between two families by a single pair of blocks with a
high Z score is termed a single hit.
However, protein families often have a number of blocks.
A multiple hit is when two or more block pairs
from the same two families are similar:
multiple hit
Family 1, blocks 1A, 1B, 1C, 1D. 1A=2B + 1D=2C
Family 2, blocks 2A, 2B, 2C.
We expect the order of the blocks in the hit to be the same in both
families (in this example 1A -> 1D and 2B -> 2C).
Individual block pairs with Z scores likely by chance
by themselves can still indicate a genuine relation if they
are in a multiple hit. While the shuffled blocks scores contained
no single hit with Z score above 8.3, there were no multiple hit
with Z scores less than 5.6 . Hence genuine relationships can also
be indicated by several alignments whose Z scores are
individually expected to occur by chance.
When comparing blocks against a database the Z score cutoff is set as 5.6, corresponding to expected occurrence rate of 0.385 per searching 5000 blocks. When both query and target blocks are provided other cutoffs can be chosen.
False positive (high score but no relation) and false negative (low score but genuine relation) hits are still possible and biological knowledge and common sense should be used. Compositionally biassed blocks (consisting of sequence segments rich in a few amino acids or short repeats) are a common cause for false positive hits. You can check if a block is biassed here. False negative hits can be caused by misalignment in the blocks .
Each entry in the Blocks Database version 8.6 (3174 blocks from 858 protein families) was searched against the other entries in the database. All block pairs with Z scores larger than 5.6 were saved. Protein families related by more then one saved score were considered as multiple hits and alignments with Z scores above 8.3 as single hits. This resulted in 141 pairs of families. Eighty percent of these were identified as genuine relationships (true positives) according to the family descriptions, by sharing common sequences, or by detailed examination. Compositional bias was responsible for another eight percent of the high scores. The remaining twelve percent of the high scores could not be classified as either genuine or false based on available evidence.
Relation type | Genuine(1) | Biassed Composition | Unknown | Total |
Multiple block hits- independent(2) | 24 | - | 1 | 25 |
- repeats(3) | 11 | 6 | 9 | 26 |
- inner repeats(4) | 15 | 4 | 2 | 21 |
Single block hits | 63 | 1 | 5 | 69 |
Total | 113 | 11 | 17 | 141 |
Fraction | 80% | 8% | 12% |
(1) Genuine relations were identified by the families prosite descriptions, detailed analysis of the literature or by sharing common sequences (22 of the single and independent-multiple hits). (2) An independent multiple hit is two different protein families related by two or more different block pairs. (3) A repeat multiple hit is two different protein families where a block from one family is similar with two or more blocks from the other family. (4) An inner-repeat multiple hit is a case where the similarities are between blocks from the same family.
A comparison of all the Blocks Databases v8.6 entries with each other found the following hit between FAD flavoprotein subunits from two oxidoreductase enzyme complexes, BL00504 - succinate dehydrogenases (Sdh) and fumarate reductases (Frd) and BL00677 - D-amino oxidases (DAO):
alignment Z-score expected number for block 1 from:to block 2 from:to length searching 5000 blocks BL00504A 2 : 20 and BL00677A 2 : 20 (19) score 51 (10.0 0.0e+00) [logos ?]A comparison with a lower cutoff found another hit supporting the first one:
BL00504D 3 : 35 and BL00677D 17 : 49 (33) score 26 ( 5.5 5.1e-01) [logos ?]Sequence annotations and a literature search revealed that block BL00504A is the FAD-binding site and BL00504D is the active site (Birch Machin et al., 1992) of the Sdh/Frd flavoproteins. Block BL00677A is the FAD-binding site of the DAO proteins. The FAD AMP-binding sites in both families are beta-alpha-beta ADP binding folds and were already noted as such (Birch-Machin et al., 1992; Schulz et al., 1982). This explains the first hit.
The DAO BL00677D block has a conserved histidine important for enzymatic activity of pig DAO (Miyano et al., 1991). This histidine is aligned with a conserved and essential histidine in the Sdh/Frd flavoproteins catalytic site (Birch-Machin et al., 1992; Schroder et al., 1991). Other positions in these aligned regions are also similar (column scores 0.31 to 0.98). The dissimilar positions have column scores close to zero (0.04 to -0.14). This finding suggests that the active site of DAO flavoproteins is in the BL00677D region with the conserved histidine as the crucial residue.
BLAST and FASTA searches of the SwissProt protein database could not identify this similarity. No sequence from one family identified any sequence from the other family. Optimal local alignments of all the sequence pairs from the two families had scores expected by chance. Searching the Blocks Database with the sequences from the two families identified the relation between the families with 6 Sdh/Frd flavoproteins sequences (multiple hits with 98.1 to 76.2 percentiles of scores with shuffled sequence queries and P values of 8.4*10-3 to 1.1*10-1) but not with the other two sequences from that family or any of the sequences from the DAO family (single hits with less then 60.0 score percentiles).
Suggested catalytic site of DAO flavoproteins. A, positions 17-49 of DAO flavoproteins (block BL00677D) aligned with the catalytic region of Sdh/Frd flavoproteins (positions 3-35 of block BL00504D). The histidines important for the enzymes catalytic activity are outlined (the histidine in sequence DHSA_BACSU is misaligned due to a two aa insertion). The start and end coordinates flank the sequences. B, the column scores of the alignment.
Birch-Machin, M. A., Farnsworth, L., Ackrell, B. A., Cochran, B., Jackson, S., Bindoff, L. A., Aitken, A., Diamond, A. G. & Turnbull, D. M. (1992). The sequence of the flavoprotein subunit of bovine heart succinate dehydrogenase. J. Biol. Chem. 267, 11553-11558.
Miyano, M., Fukui, K., Watanabe, F., Takahashi, S., Tada, M., Kanashiro, M. & Miyake, Y. (1991). Studies on Phe-228 and Leu-307 recombinant mutants of porcine kidney D-amino acid oxidase: expression, purification, and characterization. J. Biochemistry 109, 171-177.
Schroder, I., Gunsalus, R. P., Ackrell, B. A., Cochran, B. & Cecchini, G. (1991). Identification of active site residues of Escherichia coli fumarate reductase by site-directed mutagenesis. J. Biol. Chem. 266, 13572-13579.
Schulz, G. E., Schirmer, R. H. & Pai, E. F. (1982). FAD-binding site of glutathione reductase. J. Mol. Biol. 160, 287-308.
Conserved regions from snake toxins and the CD59 extracellular domain were found similar to each other. The alignment score is not very striking but the two families seem be quite dissimilar. What is the connection between snake toxins, small extracellular proteins that bind to nerve receptors, and the CD59 domain, a domain that is found in one or more copies on GPI-linked cell surface glycoproteins ? a closer look at the alignment was taken by requesting to see the column scores. These scores are shown above the score line for each of the 12 alignment positions (8,3 to 19,14):
Column scores for optimal alignment of BL00272B and BL00983B - 8, 3 9, 4 10, 5 11, 6 12, 7 13, 8 14, 9 15,10 16,11 17,12 18,13 19,14 0.262 0.169 0.138 0.286 0.995 1.000 0.368 0.224 0.986 -0.067 1.000 1.000 BL00272B 8 : 19 and BL00983B 3 : 14 (12) score 53 ( 6.5 6.0e-02) [logos ?]Five of the positions [(12,7), (13.8), (16,11), (18,13) and (19,14)] have very high column scores (0.986-1.000) indicating identical and almost identical amino acid distribution in these column pairs. The other positions contribute less to the alignment score and position (12,17) has a slightly negative score, actually detracting from the alignment.
Upon requesting to see the PSSMs of the blocks (below) or their aligned logos (link to 'logos' above) you will note that 3 of the alignment positions contributing to the score are highly conserved cysteine residues. This raises the possibility of identical patterns of disulphide bonds in both regions. We might give this alignment more attention since disulphide bonds are known to be well conserved even between distantly related sequences. More information can be found by following the block links to the Blocks Database entries. Each family is accompanied by its InterPro annotation and the multiple alignment each block can be viewed as a graphical sequence logo. The structures of both proteins are known and confirm their relation. (The SWISS-3DIMAGE was the source for these images of the structures.)
PSSM of BL00272B | 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 --+---------------------------------------------------------------------------- A | 0 0 0 13 0 3 0 0 1 0 2 0 0 0 1 0 2 0 0 C | 87 12 12 11 0 0 0 0 21 0 6 100 100 0 0 0 0 99 0 D | 0 0 3 2 11 9 3 2 6 0 0 0 0 3 0 82 10 0 2 E | 0 5 2 8 3 5 2 9 4 6 9 0 0 7 0 5 5 0 0 F | 2 3 9 0 0 2 6 4 2 2 0 0 0 0 0 0 0 0 0 G | 1 1 1 1 3 0 24 8 7 0 0 0 0 2 3 0 1 0 2 H | 0 0 2 0 0 4 2 0 4 0 13 0 0 6 0 0 0 0 0 I | 0 0 4 0 2 1 0 17 7 30 3 0 0 0 6 0 0 0 0 K | 0 3 22 4 30 3 8 5 17 0 24 0 0 16 1 0 36 0 0 L | 0 0 1 0 1 3 8 3 0 14 5 0 0 0 0 0 2 0 0 M | 0 0 0 0 11 2 9 0 3 0 3 0 0 0 0 0 0 0 0 N | 0 0 0 5 2 7 2 2 2 0 2 0 0 16 0 13 22 1 96 P | 6 65 9 2 3 23 6 8 2 5 0 0 0 0 0 0 0 0 0 Q | 0 2 6 0 0 1 0 1 6 0 8 0 0 3 0 0 0 0 0 R | 0 2 4 15 8 2 6 6 2 3 8 0 0 10 0 0 19 0 0 S | 1 4 4 6 13 3 0 4 6 2 1 0 0 19 16 0 0 0 0 T | 3 4 14 5 5 5 1 5 0 4 3 0 0 18 72 0 0 0 0 V | 0 0 3 28 5 1 0 19 1 22 6 0 0 0 0 0 1 0 0 W | 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 0 0 0 0 X | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Y | 0 0 5 0 3 2 23 7 9 11 7 0 0 0 0 0 2 0 0 - | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PSSM of BL00983B | 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 0 1 2 3 4 --+-------------------------------------------------------- A | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C | 0 0 0 0 0 0 91 100 0 0 0 0 100 0 D | 0 0 0 0 0 0 0 0 0 0 76 0 0 0 E | 0 17 29 0 20 0 0 0 0 42 0 0 0 0 F | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G | 10 0 0 0 0 0 0 0 10 0 0 0 0 0 H | 0 0 0 0 0 39 0 0 0 0 0 0 0 0 I | 25 0 0 0 0 0 0 0 0 0 0 0 0 0 K | 0 0 0 30 0 0 0 0 28 23 0 0 0 0 L | 0 0 48 0 0 0 0 0 0 9 0 100 0 0 M | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N | 25 23 0 13 0 0 0 0 0 0 24 0 0 100 P | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q | 0 15 0 0 0 13 0 0 29 0 0 0 0 0 R | 0 12 0 12 0 0 0 0 24 0 0 0 0 0 S | 0 20 0 11 0 18 0 0 9 12 0 0 0 0 T | 23 13 0 23 35 10 0 0 0 14 0 0 0 0 V | 16 0 22 11 0 8 9 0 0 0 0 0 0 0 W | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Y | 0 0 0 0 45 13 0 0 0 0 0 0 0 0 - | 0 0 0 0 0 0 0 0 0 0 0 0 0 0("X" specifies unknown amino acids.)
Excision and insertion of bacterial insertion sequence elements (IS) require the activity of a transposase protein sometimes encoded by the ISs. The IS30 transposase family (Dong et al., 1992) is represented by five blocks in BLOCKS 8.6. A region of 21 positions from the first block had high scores (Z scores 6.7 to 8.8) only to helix-turn-helix DNA-binding motifs (hth) from four protein families (see the figure in the next example). Hth DNA binding motifs occur in many proteins that bind specific DNA sequences (Pabo & Sauer, 1992).
BLAST searches of the SwissProt protein database with the IS30 sequences did not identify any protein with known hth region. Searching the Blocks Database with the IS30 sequences gave high scores with hth blocks for two of the sequences (98.1 and 93.1 percentiles of scores with shuffled sequence queries (Henikoff & Henikoff, 1994)). The other two sequences had low scores with hth blocks (30.8 and 18.1 score percentiles) and higher scores with non-hth blocks. However, each of the transposases putative DNA binding regions was detected by the method of Dodd and Egan (Dodd & Egan, 1990) as an almost certain hth domain.
Classification of the first IS30 block as a hth motif is supported by the finding that the N-terminal region of an IS30 transposase, containing the putative hth DNA-binding region, binds the IS30 element (Stalder et al., 1990).
Hth-like region in IS30 transposases. Block BL01043A of the IS30 transposases family. The regions similar to the hth motifs in the block to block searches are underlined. The start and end coordinates flank the sequences. The diagram shows the suggested position of the hth motifs found by the hth algorithm (Dodd & Egan, 1990). The algorithm scores for hth motifs were 5.19 standard deviation units (SD), corresponding to 100% probability for TRA1_STRSL, 5.95 SD and 100% for TRA4_BACFR, 4.13 SD and 90% for TRA8_ALCEU, and 5.72 SD and 100% for TRA8_ECOLI.Dodd, I. B. & Egan, J. B. (1990). Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucl. Acid. Res. 18, 5019-5026.
Dong, Q., Sadouk, A., van der Lelie, D., Taghavi, S., Ferhat, A., Nuyten, J. M., Borremans, B., Mergeay, M. & Toussaint, A. (1992). Cloning and sequencing of IS1086, an Alcaligenes eutrophus insertion element related to IS30 and IS4351. J. Bacteriol. 174, 8133-8138.
Henikoff, S. & Henikoff, J. G. (1994). Protein family classification based on searching a database of blocks. Genomics 19, 97-107.
Pabo, C. O. & Sauer, R. T. (1992). Transcription factors: structural families and principles of DNA recognition. Annu. Rev. Biochem. 61, 1053-1095.
Stalder, R., Caspers, P., Olasz, F. & Arber, W. (1990). The N-terminal domain of the insertion sequence 30 transposase interacts specifically with the terminal inverted repeats of the element. J. Biol. Chem. 265, 3757-3762.
In comparing the entries in the Blocks Database v8.6 among themselves all fourteen hth blocks had high scores with two or more other hth blocks (Figure). The two high scoring non-hth blocks could be distinguished by relating to single hth block and having lower scores relative to the ones between the hth blocks. The blocks are from four types of protein families - bacterial regulatory proteins, homeobox domain proteins, sigma bacterial transcription initiation factors and IS transposases. Manual inspection of the Prosite annotation of the protein families in the Blocks Database and of blocks themselves found no other hth blocks in the database.
The hth blocks included different number of sequences, from 4 to 185. There was no correlation between the number of sequences in a block and its relation to other blocks. This suggests that even blocks with 4-6 sequences can give a correct representation of conserved protein domains. More than 90% of the blocks in the database used had more than four sequences. This fraction is increasing with each release (>94% in BLOCKS 9.0) as the number of new protein sequences is higher than the number of new protein families (Green et al., 1993; Koonin et al., 1995; Koonin et al., 1994).
Hth blocks illustrate the problem of distinguishing genuine relationships from chance ones and suggest a solution. Two of the hth blocks (BL00622 and BL01063B) lie below the threshold for detection single-hit relations (Z score >=8.3, bold lines in Figure). Protein families with hth-motifs usually have no other common blocks to support the relation between the hth blocks. However, hth motifs are found in several protein families. These hth blocks all have high scores with each other, but not all these scores are high enough to identify genuine relationships by themselves. Nevertheless, blocks with a number of such scores to known hth blocks can be identified as hth blocks too. The two non-hth blocks have high scores to single hth blocks, and do not form part of the connected graph. An analogous strategy is the basis for detecting weak similarities in single-sequence alignments using the BLAST3 program (Altschul & Lipman, 1990).
High scores of helix-turn-helix DNA binding blocks. All 14 hth blocks found in BLOCKS 8.6 and their high scoring relationships with each other (true positives) and with other blocks (false positives, outward pointing lines). Each block had different sequences except two pairs of homeobox blocks that had common sequences (BL00027 with BL00032B and with BL00035B). Lines show scores above the 5.6 Z score cutoff. Thick lines correspond to scores above the 8.3 Z score cutoff. BRP - bacterial regulatory proteins.Since all the hth blocks are similar to one another we examined how well would one composite hth block identify other hth blocks. The ecmot database (Koonin et al., 1995) contains such a composite hth block, with 609 sequence segments from many hth families. The graphical representation (logo) of this block illustrates the conservation in each of its positions. This and the avoidance of particular amino acids at specific positions can also be seen in the PSSM of block EC0157. This block had high scores with 18 blocks in Blocks Database v8.6 (Table). Fourteen of those are the hth blocks discussed above. All the hth blocks had high to extremely high scores, the lowest one expected to occur 3.2e-3.
The four blocks at the end of the table have significantly lower scores (Z 5.6-6.5). These are non-hth blocks but their similarity to the composite hth block can be explained. Two of the blocks are from bacterial regulatory proteins families, occurring C-terminal to the hth motifs. One is a hth-similar region from the araC family (Brunelle & Schleif, 1989) and the other corresponds to the hth helix3 and DNA binding hinge helix in the E.coli lac repressor protein (Lewis et al., 1996). Another block is from the S3 ribosomal proteins (BL00548A). This protein binds RNA, and it is interesting to note the recent report of the RNA binding activity by a hth domain (Dubnau & Struhl, 1996). The last non-hth block is from L-lactate dehydrogenase (LDH) proteins. LDHs do not bind DNA but the crystal structure of the detected region (alpha-2f to Beta-G) is a helix-turn followed by a helix or strand in different proteins (Abad Zapatero et al., 1987; Grau et al., 1981; Iwata & Ohta, 1993).
Blocks similar to composite hth block EC0157
Protein family (1) | Z score |
---|---|
'Homeobox' domain proteins | 18.4 |
'Homeobox' antennapedia-type proteins | 13.2 |
'POU' domain proteins | 11.7 |
BRP crp family | 12.1 |
BRP gntR family | 12.4 |
BRP lysR family | 14.4 |
BRP lacI family (2) | 11.7 |
BRP luxR family | 12.4 |
BRP arsR family | 8.0 |
BRP deoR family | 8.7 |
BRP tetR family | 14.1 |
Sigma-54 factors family | 7.8 |
Sigma-70 factors ECF subfamily | 8.3 |
Transposases, IS30 family | 11.2 |
BRP araC family | 6.5 |
BRP lacI family (2) | 6.6 |
Ribosomal S3 proteins | 5.8 |
L-lactate dehydrogenase family | 5.8 |
(1) The family Blocks Database entry numbers are in the previous figure except for BRP araC family - BL00041, L-lactate dehydrogenase - BL00064D and Ribosomal protein S3 proteins - BL00548A. The non-hth blocks are separated at the end of the table. (2) Two blocks from the lacI hth family are similar to the composite hth block - block BL00356A, the hth region, and block BL00356B, the following DNA-binding hinge region.Identifying all the hth regions in the Blocks Database illustrates the potential of the multiple alignment comparison method as an aid for annotating protein-family databases. Besides identifying the function of unknown regions, the approach outlined in this example can be useful in annotating databases that generate the multiple alignments automatically. Multiple alignments of characterized protein motifs (such as the hth, nucleotide binding folds or leucine zipper) could be used to identify other multiple alignments containing these motifs.
Altschul, S. F. & Lipman, D. J. (1990). Protein database searches for multiple alignments. Proc. Natl. Acad. Sci. USA 87, 5509-5513.
Abad Zapatero, C., Griffith, J., Sussman, J. & Rossmann, M. (1987). Refined crystal structure of dogfish M4 apo-lactate dehydrogenase. J Mol Biol 198, 445-467.
Brunelle, A. & Schleif, R. (1989). Determining residue-base interactions between AraC protein and araI DNA. J Mol Biol 209, 607-622.
Dubnau, J. & Struhl, G. (1996). RNA recognition and translational regulation by a homeodomain protein. Nature 379, 694-699.
Grau, U., Trommer, W. & Rossmann, M. (1981). Structure of the active ternary complex of pig heart lactate dehydrogenase with S-lac-NAD at 2.7 A resolution. J Mol Biol 151, 289-307.
Green, P., Lipman, D., Hillier, L., Waterston, R., States, D. & Claverie, J. M. (1993). Ancient conserved regions in new gene sequences and the protein databases. Science 259, 1711-1716.
Iwata, S. & Ohta, T. (1993). Molecular basis of allosteric activation of bacterial L-lactate dehydrogenase. J Mol Biol 230, 21-27.
Koonin, E., Tatusov, R. & Rudd, K. (1995). Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications. Proc Natl Acad Sci USA 92, 11921-11925.
Koonin, E. V., Bork, P. & Sander, C. (1994). Yeast chromosome III: new gene functions. EMBO J. 13, 493-503.
Lewis, M., Chang, G., Horton, N. C., Kercher, M. A., Pace, H. C., Schumacher, M. A., Brenan, R. G. & Lu, P. (1996). Crystal Structure of the Lactose Operon Repressor and Its Complexes with DNA and Inducer. Science 271, 1247 1254.
Following are links to tables with this data. Note that the scores in the tables are the raw scores of the alignments. The scores shown in the LAMA output are normalized by dividing the raw score by the alignment length.
Credits and citation
The multiple alignment comparison method and LAMA program were developed by
Shmuel Pietrokovski
in the lab of Steve Henikoff at the
Fred Hutchinson Cancer Research Center,
Seattle.
An article describing the method and its uses
"Searching Databases of Conserved Sequence Regions by
Aligning Protein Multiple-Alignments"
appeared in
Nucleic Acids Research 24(19) 3836-3845 (October 96').
This article should be cited in research using this method.