Hybrid-template Structural Modeling
Spanner is a structural homology modeling program that threads a query amino-acid sequence onto a template protein structure. Spanner is unique in that it handles gaps by spanning the region of interest using fragments of known structures.
Spanner consists of several modules that are managed by a Python driver script. The first step involves defining the start and end points of fragments corresponding to insertions or deletions. The start and end points are referred to as anchors because they must be equivalent in both the template and any candidate fragment. The margin parameter determines how far from the edge of a gap the fragment begins or ends. For example a margin of 0 would mean that the anchors begins at the very edge of a gap. This is usually not a good idea, and the default margin is set to 1.
A representative set of protein chains was prepared using the cd-hit program (Li and Godzik, 20061) at 100% sequence identity. All continuous fragments were then extracted from this set of chains and stored in a relational database, indexed by the internal coordinates of the fragment endpoints (figure 2). A separate database is prepared for each fragment length. Currently, fragments of length 8-40, including the 8 anchor residues, are stored in the database.
Spanner considers each fragment in order of it position in the query sequence.
For a given fragment, a fragment index is generated from the template anchor residues. A tolerance in the fit to the anchor residues is used to specify a range of index values. The index range is used to generate an PGSQL query to the appropriate fragment database and all fragments satisfying the range of indices are returned. PDB entries that should be excluded from the RDB search can be specified by the user, a feature that was utilized in the present work in order screen out close homologs when benchmarking the program. Since the number of returned fragments is sensitive to the tolerance in the fit to the anchor residues, the retrieval step starts with a small value (2.0 Å by default), and incrementally increases the tolerance until the required number of fragments (5000 by default) or a maximum tolerance (2.5 Å by default) is reached.
The fragments returned from the DB are then sorted by a simple score that is a function only of the primary and secondary structure similarities
where S seq is proportional to a log-odds sequence substitution matrix score derived from a large number of structure alignments (Standley, et al., 20072) and S sec is proportional to a secondary structure substitution matrix score (Kawabata and Nishikawa, 20003). A specified number candidate fragments (1000 by default) are then retained. These retained candidates are then re-scored using a more sensitive function that takes structure into account and is given by
where S clash is a weighted sum of clashes between the fragment and the rest of the template structure excluding residues that are to be replaced by the fragment, and RMSD fit is given by the root-mean square deviation of C α atoms in the fitted anchor residues. The user-specified number of top-scoring fragments (1 by default) is then output.
Two options are available for replacing and optimizing their the conformations of side-chains. The first is to use the dead-end elimination method (Desmet, et al., 19924; Tanimura, et al., 19945). The second is to use the SCWRL4 program (Krivov, et al., 20096). The complete model is then refined by energy minimization using either Presto (Morikami, et al., 19927) or Gromacs (Van Der Spoel, et al., 20058).
To create a model, you must provide a template structure, as well as an alignment of the query sequence you wish to model onto the template sequence. Spanner will replace matching residues, fill any gaps caused by insertions or deletions residues, and minimize the energy of the structure.
The resulting PDB-formatted model, as well as a log file, will be emailed to you when spanner is finished. If an error prevented a complete homology model from being generated (either due to an internal error or when the input can not be processed), the log file will explain which part of the modeling sequence failed. Alternatively, we provide an alignment ‘sanity check’ that will validate,
and, if necessary and feasible, correct the input.
7 Morikami, K., Nakai, T., Kidera, A., Saito, M. and Nakamura, H. (1992) Presto(Protein Engineering Simulator) - a Vectorized Molecular Mechanics Program for Biopolymers, Computers & Chemistry, 16, 243-248.
8 Van Der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A. E. and Berendsen, H. J. C. (2005), GROMACS: Fast, flexible, and free. Journal of Computational Chemistry, 26: 1701–1718. doi: 10.1002/jcc.20291
>template ----------RFHSFSFYELKNVTNNFDERPISVGGNKMGEGGFGVVYKGYVNNTTVAVKK LAAMVDITTEELKQQFDQEIKVMAKCQHENLVELLGFSSDGDDLCLVYVYMPNGSLLDRLS CLDG-TPPLSWHMRCKIAQGAANGINFLHENH--HIHRDIKSANILLDEAFTAKISDFGLA RASE-------KFAQTVMTSRIVGTTAYMAPEA-LRGEITPKSDIYSFGVVLLEIITGLPA VDEHRE-PQLLLDIKEEIEDE--------------------------EKTIEDYIDKKMND ADSTSVEAMYSVASQCLHEKKNKRPDIKKVQQLLQEMT---- >query SVSLLQGARPFPFCWPLCEISRGTHNFSEE------LKIGEGGFGCVYRAVMRNTVYAVKR LKENADLEWTAVKQSFLTEVEQLSRFRHPNIVDFAGYCAQNGFYCLVYGFLPNGSLEDRLH CQTQACPPLSWPQRLDILLGTARAIQFLHQDSPSLIHGDIKSSNVLLDERLTPKLGDFGLA RFSRFAGSSPSQSSMVARTQTVRGTLAYLPEEYIKTGRLAVDTDTFSFGVVVLETLAGQRA VKTHGARTKYLKDLVEEEAEEAGVALRSTQSTLQAGLAADAWAAPIAMQIYKKHLDPRPGP CPPELGLGLGQLACCCLHRRAKRRPPMTQVYERLEKLQ