User Tools

Site Tools


en:mafftash:home

Table of Contents

MAFFTash

Alignment of Multiple Sequences and Structures

Overview

MAFFTash is a server that calculates multiple sequence alignments from sequences and structures. It consists of two existing programs, MAFFT and ASH. ASH is a structural alignment program that utilizes an extension of the double dynamic programming algorithm to maximize the number of structurally equivalent residues between two proteins [1-3]. The pairwise structural alignments are then subjected to MAFFT, a widely-used multiple sequence alignment program [4-9]. MAFFT uses the structural alignments to construct an overall multiple alignment that is consistent with the pairwise structural alignments as much as possible. Sequence homologs with no structural information can also be included in the alignment.

Usage

To run MAFFTash you must provide a list of sequences and/or PDB and chain identifiers. The list may be pasted into the text area or uploaded from an external file. In either case, the sequences must be input in FASTA format, and the PDB and chain identifier must be joined as a string of length 5 (e.g. 1nagA). Each PDB and chain identifier line must be proceeded by a line containing the string >PDBID and nothing else. For example:

>PDBID
3ygsC
>Q6Q899|DDX58_MOUSE| 1-91
MTAAQRQNLQAFRDYIKKILDPTYILSYMSSWLEDEEVQYIQAEKNNKGPMEAASLFLQY
LLKLQSEGWFQAFLDALYHAGYCGLCEAIES
>Q6Q899|DDX58_MOUSE| 101-176
EEHRLLLRRLEPEFKATVDPNDILSELSECLINQECEEIRQIRDTKGRMAGAEKMAECLI
RSDKENWPKVLQLALE
>PDBID
2p1hA

is valid input. Note that chain identifiers are now mandatory for all PDB entries. Whitespaces (' '), dashes ('-'), and underbars ('_') are not acceptable chain identifiers. If you are uncertain about which chain IDs to use, please use the PDBj search engine. Type in your PDB ID, then click on 'sequence information (FASTA format)'. You will see the PDB sequence for each chain in FASTA format.

You are not limited to PDB entries and may provide your own PDB-formatted structures. To upload your own structures, click on the 'Add your own structures' checkbox and upload a pdb-formatted file. The Structure weight (default value .5) controls how much influence ASH has on the MAFFT alignment. Different values might need to be experimented with, depending on the ratio of structures to sequences.

References

  1. Standley, Toh, Nakamura 2007 (BMC Bioinformatics 4;8:116 )ASH structure alignment package: sensitivity and selectivity in domain classification.
  2. Standley, Toh, Nakamura, 2005 (BMC Bioinformatics 6, 221) GASH: an improved algorithm for maximizing the number of equivalent residues between two protein structures
  3. Standley, Toh, Nakamura, 2004 (Proteins 57(2):381-91) Detecting local structural similarity in proteins by maximizing number of equivalent residues
  4. Katoh, Asimenos, Toh 2009 (Methods in Molecular Biology 537:39-64) Multiple Alignment of DNA Sequences with MAFFT. In Bioinformatics for DNA Sequence Analysis edited by D. Posada
  5. Katoh, Toh 2008 (BMC Bioinformatics 9:212) Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework.
  6. Katoh, Toh 2008 (Briefings in Bioinformatics 9:286-298) Recent developments in the MAFFT multiple sequence alignment program.
  7. Katoh, Toh 2007 (Bioinformatics 23:372-374)  Errata PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences.
  8. Katoh, Kuma, Toh, Miyata 2005 (Nucleic Acids Res. 33:511-518) MAFFT version 5: improvement in accuracy of multiple sequence alignment.
  9. Katoh, Misawa, Kuma, Miyata 2002 (Nucleic Acids Res. 30:3059-3066) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
en/mafftash/home.txt · Last modified: 2014/03/20 13:40 by kmamada