Block Searcher Help

Introduction


As an aid to detection and verification of protein sequence homology, the BLOCKS Searcher compares a protein or DNA sequence to the current database of protein blocks. Blocks are short multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins.

The rationale behind searching a database of blocks is that information from multiply aligned sequences is present in a concentrated form, reducing background and increasing sensitivity to distant relationships. This information is represented in a position-specific scoring table or "profile" (4), in which each column of the alignment is converted to a column of a table representing the frequency of occurrence of each of the 20 amino acids. For searching a database of blocks, the first position of the sequence is aligned with the first position of the first block, and a score for that amino acid is obtained from the profile column corresponding to that position. Scores are summed over the width of the alignment, and then the block is aligned with the next position. This procedure is carried out exhaustively for all positions of the sequence for all blocks in the database, and the best alignments between a sequence and entries in the BLOCKS database are noted. If a particular block scores highly, it is possible that the sequence is related to the group of sequences the block represents. Typically, a group of proteins has more than one region in common and their relationship is represented as a series of blocks separated by unaligned regions. If a second block for a group also scores highly in the search, the evidence that the sequence is related to the group is strengthened, and is further strengthened if a third block also scores it highly, and so on.
Return to top

WWW form


Return to top

Submitting multiple searches

If you have a large query sequence or several queries, Here is a sample perl script to do bulk searches.

The Blocks Email Searcher currently runs on a Sun E250 workstation. It uses an unsophisticated first-in/first-executed queuing scheme and can complete an average search of one typical search of 350 amino acids every two minutes. A DNA search takes longer because the sequence is translated in all six frames. So a 1000 residue DNA query will take about the same amount of time as six average amino acid queries, or about 12 minutes, and a contig of 10,000 residues will take about two hours. Consequently, if you have more than five searches to do, we ask that you space them at reasonable intervals depending on the type and size of your sequences so other people can get searches processed between yours. The Blocks Searcher is least busy on weekends and between about 20:00 and 04:00 Pacific coast (USA) time. We appreciate your considerate use of this service.
Return to top

Interpreting results of a search

Example using a protein query
Example using a DNA query

Heading
Query=Description line from query sequence
Size=Number of amino acids for protein query or base pairs for DNA query. Be sure this number is correct before interpreting your results.
Blocks searched=Number of blocks searched with query.
Alignments done=Number of alignments done between query and blocks searched. this number is used to determine the expected value for each hit.
Cutoff expected value=Maximum combined E-value reported. This is the number of matches expected to be found merely by chance.

Summary
One line is printed per hit, where a hit consists of blocks belonging to a protein family represented in the database of blocks searched with combined E-value less than or equal to the cutoff.

Details
Detailed information is printed for each hit, including alignments with the most similar sequence in each block.


Return to top

Getting documentation for blocks

Following up a potentially interesting hit is often aided by examining the full set of blocks for a group. Hits are linked to Get blocks.


Return to top

References

If you find the Blocks Searcher useful, please cite:

Henikoff S, Henikoff JG: Protein family classification based on
searching a database of blocks", Genomics 1994, 19:97-107.
[Postscript PDF]

Other references for this work are:

1. Henikoff S, Henikoff JG: Automated assembly of protein blocks for database
searching. Nucleic Acids Res. 1991, 19:6565-6572.
[Postscript PDF]

2. Bairoch A: PROSITE: A dictionary of sites and patterns in proteins. Nucleic
Acids Res. 1992, 20:2013-2018.
[Prosite page]

3. Bairoch A, Boeckmann B: The SWISS-PROT protein sequence data bank. Nucleic
Acids Res. 1992, 20:2019-2022.
[Swiss-Prot page]

4. Henikoff JG and HENIKOFF S: Using substitution probabilities
to improve position-specifiic scoring matrices", CABIOS 1996, 12:135-143.
[Postscript PDF]

5. Wallace JC, Henikoff S: PATMAT: a searching and extraction program for
sequence, pattern, and block queries and databases. CABIOS 1992, 8:249-254.

6. Henikoff S: Detection of Caenorhabditis transposon homologs in diverse
organisms. New Biol. 1992, 4:382-388.

7. Oliver SG et al.: The complete DNA sequence of yeast chromosome III. Nature
1992, 357:38-46.

8. Bork P, Ouzounis C, Sander C, Scharf M, Schneider R, Sonnhammer E: What's
in a genome? Nature 1992, 358:287.

9. Henikoff S, Henikoff JG: A protein family classifcation method for
analysis of large DNA sequences, Proc. 27th HICSS 1994, p. 265-274.
[Postscript PDF]

10. Henikoff S, Henikoff JG: Position-based sequence weights, J. Mol. Biol.
1994, 243:574-578.
[Postscript PDF]

11. Tatusov RL, Altschul SF, Koonin EV: Detection of conserved segments in
proteins: Iterative scanning of sequence databases with alignment blocks,
PNAS 1994, 91:12091-12095.

12. Henikoff JG, Henikoff S: Using substitution probabilities to improve
position-specific scoring matrices, CABIOS 1996, 12:135-143.

13. Henikoff S, Henikoff JG: Embedding strategies for effective use of
multiple sequence alignment information, 1996, sumbitted for publication.

14. Thompson, JD, Higgins, DG and Gibson, TJ: CLUSTAL W: Improving the
sensitivity of progressive multiple sequence alignment through sequence
weighting, position-specific gap penalties and weight matrix choice,
NAR 1994, 22:4673-4680.
[FTP CLUSTAL W]

15. Saitou, N and Nei, M: The neighbor-joining method: A new method for
reconstructing phylogenetic trees, Mol. Biol. Evol. 1987, 4:406-425.

16. Felsenstein, J: , Cladistics 1989, 5:164-166.
[Phylip page]

17. McLachlan, A.: ,J. Mol. Biol. 1983, 169:15-30.

18. Bailey, T.L. and Gribskov, M.: Combining evidence
using p-values: application to sequence homology searchers, Bioinformatics
1998, 14:48-54.
[Postscript]


Return to top