User Tools

Site Tools


en:seekquencer:home

SEEKQUENCER

Homolog sequences and structures collection

Usage

Input Sequences and/or PDB IDs (plus chain IDs)

You must provide a list of sequences and/or PDB and chain identifiers. The list may be pasted into the text area or uploaded from an external file. In either case, the sequences must be input in FASTA format and the PDB and chain identifier must be joined as a string of length 5 (e.g. 1nagA). Each PDB and chain identifer line must be preceeded by a line containing the string '>PDBID' and nothing else.

Examples of valid inputs

>PDBID
3ygsC
>Q6Q899_DDX58_MOUSE_1-91
MTAAQRQNLQAFRDYIKKILDPTYILSYMSSWLEDEEVQYIQAEKNNKGPMEAASLFLQYLLKLQSEGWFQAFLDALYHAGYCGLCEAIES
>Q6Q899_DDX58_MOUSE_101-176
EEHRLLLRRLEPEFKATVDPNDILSELSECLINQECEEIRQIRDTKGRMAGAEKMAECLIRSDKENWPKVLQLALE
>PDBID
2p1hA
>Q6Q899_DDX58_MOUSE_1-91
MTAAQRQNLQAFRDYIKKILDPTYILSYMSSWLEDEEVQYIQAEKNNKGPMEAASLFLQYLLKLQSEGWFQAFLDALYHAGYCGLCEAIES
>Q6Q899_DDX58_MOUSE_101-176
EEHRLLLRRLEPEFKATVDPNDILSELSECLINQECEEIRQIRDTKGRMAGAEKMAECLIRSDKENWPKVLQLALE
>Q6Q899_DDX58_MOUSE_1-91
MTAAQRQNLQAFRDYIKKILDPTYILSYMSSWLEDEEVQYIQAEKNNKGPMEAASLFLQY
LLKLQSEGWFQAFLDALYHAGYCGLCEAIES
>Q6Q899_DDX58_MOUSE_101-176
EEHRLLLRRLEPEFKATVDPNDILSELSECLINQECEEIRQIRDTKGRMAGAEKMAECLI
RSDKENWPKVLQLALE
MTAAQRQNLQAFRDYIKKILDPTYILSYMSSWLEDEEVQYIQAEKNNKGPMEAASLFLQYLLKLQSEGWFQAFLDALYHAGYCGLCEAIES
MTAAQRQNLQAFRDYIKKILDPTYILSYMSSWLEDEEVQYIQAEKNNKGPMEAASLFLQY
LLKLQSEGWFQAFLDALYHAGYCGLCEAIES
>PDBID
3ygsC
>PDBID
2p1hA

Examples of invalid inputs

>this_is_a_pdbid
3ygsC
>PDB_ID
2p1hA
3ygsC
2p1hA

Add Structure Homologs

This feature will use BLAST to search the PDB using your Input sequences and PDBIDs as a query. There are four parameters that control what Seekquencer retrieves:

Minimum Identity

This parameter controls what BLAST considers a sequence homolog. Increasing this parameter will reduce the number of PDB entries retrieved; decreasing it will increase the number retrieved. However, an internal parameter prevents PDB entries with e-values > 0.01 from being included. (Default: 20%)

Minimum Coverage

This parameter determines how much of particular PDB entry must coverthe query sequence. Ideally, the structure would cover all or most of the query; if it does not, you might consider breaking your query sequences into domains. (Default: 50%)

Clustering Threshold

This parameter prevents many instances of a particular structure from being retrieved. If you want fewer structures, lower the value; if you want more, increase it; using 100 will add all PDB entries that are homologous to your input. The pruning of sequences is performed using the program cd-hit. (Default: 90%)

Database

Select the database to use as reference. (Default: pdb)

Add Sequence Homologs

This feature allows you to pull in sequences from the UniRef database. There are six parameters that control what Seekquencer retrieves:

Minimum Identity

This parameter controls what BLAST considers a sequence homolog. Increasing this parameter will reduce the number of UNIREF entries retrieved; decreasing it will increase the number retrieved. However, an internal parameter prevents UNIREF entries with e-values > 0.01 from being included. (Default: 20%)

Minimum Coverage

This parameter determines how much of particular UNIREF entry must coverthe query sequence. Ideally, the structure would cover all or most of the query; if it does not, you might consider breaking your query sequences into domains. (Default: 50%)

Clustering Threshold

This parameter prevents many instances of a particular structure from being retrieved. If you want fewer structures, lower the value; if you want more, increase it; using 100 will add all UNIREF entries that are homologous to your input. The pruning of sequences is performed using the program cd-hit. (Default: 90%)

Database

Select the database to use as reference. (Default: uniref90)

Search Algorithm

This parameter controls what blast algorithm to use. (Default: Blast)

Trim Hit Sequence

This parameter determines the resulting hit sequences. Selecting 'Yes' will return only the aligned regions (your input against the hits) while selecting 'No' will return the full sequence. (Default: Yes)

Add ASH Structural Neighbors

This feature allows you to pull in structural homologs to your query sequence(s). We maintain a database of ASH structural alignments. If one or more of your queries can be matches to one or more of the structures for which pre-computed alignments are available, the list of structuralneighborscan be added subject to the following constraints:

Minimum Identity

This parameter controls what BLAST considers a sequence homolog. Increasing this parameter will reduce the number of ASH structural neighbors retrieved; decreasing it will increase the number retrieved. However, an internal parameter prevents ASH structural neighbors with e-values > 0.01 from being included. (Default: 20%)

Minimum Coverage

This parameter determines how much of particular ASH structural neighbor must coverthe query sequence. Ideally, the structure would cover all or most of the query; if it does not, you might consider breaking your query sequences into domains. (Default: 50%)

Clustering Threshold

This parameter prevents many instances of a particular structure from being retrieved. If you want fewer structures, lower the value; if you want more, increase it; using 100 will add all ASH structural neighbors that are homologous to your input. The pruning of sequences is performed using the program cd-hit. (Default: 90%)

Cluster Final Results

This feature allows you perform a final clustering of all the results.

Threshold

This parameter prevents many instances of a particular structure from being retrieved. If you want fewer structures, lower the value; if you want more, increase it; The pruning of sequences is performed using the program cd-hit. (Default: 70%)

en/seekquencer/home.txt · Last modified: 2014/03/20 13:45 by kmamada