Block Formatter Help
Block formatter will format
a multiple sequence alignment to the
block format used in the
Blocks Database.
You can obtain multiple sequence alignments by various methods.
A number of
programs for multiple alignment are available on the WWW.
One such program is Block Maker.
It finds ungapped local multiple alignments (blocks) in groups of related
protein sequences. The BlockMaker output is already in the block format.
The minimal input is the
aligned sequence segments,
all the other data can be found or given arbitrary values.
The header fields
- The ID field is a short
identifier for sequences in the alignment.
Example - "Phosphorylases".
- The Accession field is for the block name
. It should be 7 or 8 characters long. It is recommended that the
first 7 characters will designate the block family or group and the last
character will be used to number different blocks in that group
(using upper case letters A-Z).
Examples - "TR1421_A", "TR1421_B", "Transpo".
- The minimal and maximal distance from the
previous block (or from the beginning of the sequences if this is
the first block) is the number of amino acids between the block and the
previous one (or sequences start).
Example - "5", "34".
- The Description field is for the
description of the group of sequences from which the
block was made.
Example - "'Homeobox' domain proteins".
- The Alignment method should shortly describe
how the alignment was made or found.
Examples - "Manual alignment", "MACAW", "ClustalW", "Prot. Sci. 12 p345,
87'".
- The alignment width specifies how
wide (or long) is the alignment.
Example - "32".
- The number of sequences is how many
sequences are in the alignment.
Example - "11".
All the previous fields are optional. Their values could be
given by the formatting program (either by default or from the multiple
alignment).
The multiple alignment fields
- The sequence names should not be
longer than 10 characters and be unique.
Examples - "RECA_ECOLI", "S67853_A", "PCR#543"
- The positions
of the aligned sequence segment are their offset from the
begining of the sequence. Every position should be in a separate line.
Avoid empty lines.
Example - "67"
The names and sequence position of each aligned sequence segment in the
multiple alignment will be given by the program if nothing
is entered in these fields.
Examples
FASTA format -
>vde_yeast vacuolar ATPAse 19 aa
DYYGIT
LSDDSD
HQFLLAN
>vde_cantr another yeast
1 NYYGITLAE
10 ETDHQFLLS
19 N
>reci_myctu (sequence can be in lower or upper case)
r a r t f
d l e v e
e l h t l
v a e g
>reci_mycle (sequence can be on one line or more)
SMNRFDIEVEGNHNYFVDG
>dpi1_theli (only first word in this line is read as sequence name)
EGYVYDLSV
EDNENFLVGF
>dpi2_theli (header lines MUST start with a '>')
EGYV
YDIE
VEET
HRFF
ANN
basic format -
DYYGITLSDDSDHQFLLAN
NYYGITLAEETDHQFLLSN
RARTFDLEVEELHTLVAEG
SMNRFDIEVEGNHNYFVDG
EGYVYDLSVEDNENFLVGF
EGYVYDIEVEETHRFFANN
An example of a possible output -
ID Inteins; BLOCK
AC vde_yea; distance from previous block = (98,190)
DE Protein introns (inteins).
BL gibbs; width=19; seqs=6;
vde_yeast ( 430) DYYGITLSDDSDHQFLLAN 90
vde_cantr ( 447) NYYGITLAEETDHQFLLSN 84
reci_myctu ( 417) RARTFDLEVEELHTLVAEG 100
reci_mycle ( 342) SMNRFDIEVEGNHNYFVDG 95
dpi1_theli ( 513) EGYVYDLSVEDNENFLVGF 83
dpi2_theli ( 367) EGYVYDIEVEETHRFFANN 74
//
[Blocks home]
[Block Searcher]
[Block Maker]
[Get Blocks]
[LAMA Searcher]
[Block formatter]
Page last modified August 1996