Consensus search
Character | Name | Meaning |
. or X | dot | Any amino acid allowed |
[...] | character class | Amino acids listed are allowed |
[^...] | negated character class | Amino acids listed are not allowed |
{ min, max } | specified range | Matches min to max repetitions of the previous amino acid. Min required, max allowed |
^ | caret | Matches the amino terminal |
$ | dollar | Matches the carboxy terminal |
| | pipe | Denotes alternation. For example (KL)|(LK), will match either KL or LK. |
() | brackets | Group items into a single logical item. The bracket indicates the start and end of the group. |
Motif | Consensus |
KEN box motif | KEN |
Cyclin-binding RxL motif | [KR].L.{0,1}[LF] |
C-terminal KDEL Golgi-to-ER retrieving signal | [KRHQSAP][DENQT]EL$ |
N-terminal myristoylation site | ^M{0,1}G[^EDRKHPFYW]..[STAGCN][^P] |
Instances
Column | Description | Link |
Virus | Virus species. The column is showed only if search was against Viruses proteomes. | - |
Protein Name | Protein and gene name. Information about overlapping instances and warnings. | UniProt |
Peptide | A motif sequence with the flanks. Flank are displayed as lowercase residues. | ProViz |
Length | Motif length. | - |
Start | Start position of the motif in protein. | - |
End | Stop position of the motif in protein. | - |
Column | Description | Link |
Disorder score | Mean IUPred score. High scoring peptides are less likely to be in a globular region. | - |
Conservation | Conservation score. Lower scores indicate more conserved peptides across the alignment. | ProViz |
Column | Description | Source |
Domain | Regions with domains. | Pfam and UniProt |
Structure | Regions that have structure solved by NMR or X-ray crystallography. | PDB |
Secondary Structure | Regions that have been shown to form secondary structure. | UniProt |
Motif | Regions with experimentally validated short linear motifs. | ELM and UniProt |
Region | Regions with experimental evidence for function. | UniProt |
Switch | Curated experimentally validated motif-based molecular switches. | UniProt |
Modification | Regions with sites of post-translational modifications. | PhosphoSite, phospho.ELM and UniProt |
Topology | Region topology information. | UniProt |
Isoform | Splice variants. | UniProt |
Mutagenesis | Mutated residues which alter function. | UniProt |
SNP | Single nucleotide polymorphism with disease association and genotype information. | dbSNP, 1000genomes and UniProt |
Other | Other features of interest. | UniProt |
Column | Description | Source |
GO terms | Gene ontology terms for protein containing peptide. | Gene Ontology |
Keywords | UniProt keywords for protein containing peptide. | UniProt Keywords |
Interactors: Proteins | Proteins experimentally shown to interact with the protein containing peptide. | IntAct |
Interactors: Families | Protein families interacting with the protein containing peptide. | IntAct, UniProt Protein Families |
Interactors: Domains | Domains found in proteins which interact with the protein containing peptide. | IntAct, Pfam |
Motif attribute | Description | Range |
Disorder score | IUPred score computed as mean of IUPred scores across residues of motif consensus. Lower score, more globular region. | |
Conservation score | Relative conservation score computed across the defined residues of motif consensus as described for SLiMPrints tool. Lower score, more conserved region. | |
Surface accessibility score | Proportion of the peptide that is accessible to water molecules in a solved structure of the region. | |
Anchor score | Anchor score computed as mean of Anchor scores across residues of motif consensus. Lower score, higher propensity to fold upon binding. |
Disorder: | instances with disorder score ≥ 0.4 | ||
Surface accessibility: | instances with surface accessibility percent score < 50% i.e. less than 50% of the peptide is accessible to water molecules in a solved structure of the region. | ||
Localisation: | instances with Gene Ontology terms which indicate extracellular localisation. | ||
Topology: | instances overlapping topology features which exclude intracellular regions. |
Column | Description |
InstanceId | Unique instance identifier. |
ProteinAcc | UniProt protein accession. |
ProteinName | Protein name. |
GeneName | Protein gene name. |
Hit | Motif sequence with flanking regions. Flanks are represented as lowercase residues. |
SeqStart | Motif start position in protein. |
SeqStop | Motif stop position in protein. |
IUPred | Disorder score. |
Anchor | Anchor score. |
SA | Surface accessibility score. |
Conservation <alignment> (score) | Conservation score across <alignment>. |
Conservation <alignment> (var) | Conservation variance across <alignment>. |
Domain | Format: <name>|<id>|<start>|<stop>|<distance> |
Motif | Format: <name>|<id>|<start>|<stop>|<distance> |
Modification | Format: <name>|<enzymes>|<pmids>|<description>|<id>|<start>|<stop>|<distance> |
Structure | Format: <name>|<resolution>|<method>|<chain>|<start>|<stop>|<distance> |
SNP | Format: <name>|<variant>|<id>|<start>|<stop>|<distance> |
Mutagenesis | Format: <name>|<mutation>|<id>|<start>|<stop>|<distance> |
Region | Format: <name>|<start>|<stop>|<distance> |
Topology | Format: <name>|<start>|<stop>|<distance> |
Secondary Structure | Format: <name>|<start>|<stop>|<distance> |
Isoform | Format: <name>|<variant>|<start>|<stop>|<distance> |
Switch | Format: <type>|<subtype>|<mechanism>|<id>|<start>|<stop>|<distance> |
Other | Format: <name>|<start>|<stop>|<distance> |
Field | Type | Description |
instanceId | Integer | Unique instance identifier. |
ProteinAcc | String | UniProt protein accession. |
ProteinName | String | Protein name. |
GeneName | String | Protein gene name. |
Hit | String | Motif sequence with flanking regions. Flanks are represented as lowercase residues. |
SeqStart | Integer | Motif start position in protein. |
SeqStop | Integer | Motif stop position in protein. |
IUPred | Float | Disorder score. |
Anchor | Float | Anchor score. |
SA | Float | Surface accessibility score. |
Conservation <alignment> | JSON | Conservation score and variance across <alignment>. Format: {"score": <float>, "var": <float>} |
GOterms | List of JSON | List of GO terms. Element: {"id": <GOterm id>, "name": <GOterm name>}. |
Keywords | List of JSON | List of UniProt keywords. Element: {"id": <keyword id>, "name": <keyword name>}. |
Interactors | List of JSON | List of interacting proteins with protein containing motif. Element: {"id": <UniProt protein accession>, "name": <UniProt protein name (gene name)>}. |
Domain | List of JSON | Format: feature format, see below. |
Motif | List of JSON | Format: feature format, see below. |
Modification | List of JSON | Format: feature format, see below.. |
Structure | List of JSON | Format: feature format, see below. |
SNP | List of JSON | Format: feature format, see below. |
Mutagenesis | List of JSON | Format: feature format, see below. |
Region | List of JSON | Format: feature format, see below. |
Topology | List of JSON | Format: feature format, see below. |
SecondaryStrcuture | List of JSON | Format: feature format, see below. |
Isoform | List of JSON | Format: feature format, see below. |
Switch | List of JSON | Format: feature format, see below. |
Other | List of JSON | Format: feature format, see below. |
Warnings | List of JSON | List of warnings. Element: {"name": <warning category>, "reason": <warning reason>} |
Field | Type | Description |
name | String | Feature name. |
url | String | Link to source data. |
description | JSON | Format: {"start": <feature start position>, "stop": <feature stop position>, "distance": <distance to motif consensus>, "description": <other specific information in JSON format>} |
Conservation
Column | Description | |
Con Score Combined | Conservation score combined. It is sum of conservation score and conservation variation. | |
Sig conserved residues defined positions | Proportion of residues in the defined positions of a motif that are significantly conserved (p > 0.05). | |
Sig conserved residues Flanks | Proportion of residues in the flanking positions of a motif that are significantly conserved (p > 0.05). | |
Sig conserved residues Ratio | The ratio of Sig conserved residues defined positions to Sig conserved residues Flanks. | |
L-10:L-1 | Conservation scores and residues for N-terminal flank. | |
P<motif position> | Conservation scores and residues for motif consensus. | |
R1:R10 | Conservation scores and residues for C-terminal flank. |
Column | Description | |
Con Score Combined | Conservation score combined is the sum of conservation score and conservation variation. | |
Conserved Counter | Number of species in which the motif consensus is present at the same position as the query species motif. | |
Species columns | Shows if the motif is present (C) or absent (N) at the same position as the query species motif in each species of the select clade. If no data is available (i.e. there is no protein in the alignment for the species) an "X" is supplied. |
Column | Description |
InstanceId | Unique instance identifier. |
ProteinAcc | UniProt protein accession. |
ProteinName | Protein name with gene name. |
Hit | Motif sequence with flanking regions. Flanks are represented as lowercase residues. |
SeqStart | Motif start position in protein. |
SeqStop | Motif stop position in protein. |
IUPred | Disorder score. |
Domain | Domain names separated by ";". |
Motif | Motif classes separated by ";". |
<alignment> conservation score | Conservation score across <alignment>. |
<alignment> conservation var | Conservation variance across <alignment>. |
<alignment> conservation combined | Conservation score combined. Sum of conservation score and variance across <alignment>. |
conserved_counter | Number of conserved species across <alignment>. |
<species> | C, N or X. C - the motif consensus is present at the same position as query species (conserved). N - the motif consensus is missing at the same position as query species (non-conserved). X - species is not present at the alignment (missing). |
mean_flanks | Mean of relative conservation scores across residues of flank regions. |
var_flanks | Variance of relative conservation scores across residues of flank regions. |
Sig conserved residues defined positions | Proportion of residues in the defined positions of a motif that are significantly conserved (p > 0.05). |
Sig conserved residues Flanks | Proportion of residues in the flanking positions of a motif that are significantly conserved (p > 0.05). |
Sig conserved residues Ratio | The ratio of Sig conserved residues defined positions to Sig conserved residues Flanks. |
L-10:L1 | The relative conservation scores for residues in N-termini flank. |
L<position> | The relative conservation scores for residues in motif consensus. |
R1:R10 | The relative conservation scores for residues in C-termini flank. |
Alignment | Hyperlink to ProViz visualisation tool. |
Field | Type | Description |
instanceId | Integer | Unique instance identifier. |
ProteinAcc | String | UniProt protein accession. |
ProteinName | String | Protein name. |
GeneName | String | Protein gene name. |
Hit | String | Motif sequence with flanking regions. Flanks are represented as lowercase residues. |
SeqStart | Integer | Motif start position in protein. |
SeqStop | Integer | Motif stop position in protein. |
IUPred | Float | Disorder score. |
ConservationScore | Float | Conservation score. |
ConservationVar | Float | Conservation variance. |
ConservationScoreCombined | Float | Conservation score combined. Sum of conservation score and conservation variance. |
Conservation_Scores | JSON | Conservation score and variance across <alignment>. Format: {"<searchdb>": {"score": <float>, "var": <float>}}. |
Domain | List of JSON | Format: feature format, see description.. |
Motif | List of JSON | Format: feature format, see description.. |
mean_flanks | Float | Mean of relative conservation scores across residues of flank regions. |
var_flanks | Float | Variance of relative conservation scores across residues of flank regions. |
flank_sig | Float | Proportion of residues in the flanking positions of a motif that are significantly conserved (p > 0.05). |
motif_sig | Float | Proportion of residues in the defined positions of a motif that are significantly conserved (p > 0.05). |
ratio_sig | Float | The ratio of Sig conserved residues defined positions to Sig conserved residues Flanks. |
motif_sig_pos | List of Integer | The defined positions of a motif consensus. |
conserved_counter | Integer | Number of conserved species i.e. motif consensus is at the same position as query species. |
Conservation | JSON | Species conservation. Format: {"species_code": Boolean}. True - motif consensus is present at the same position, False - motif consensus is missing at the same position. |
flank_residues | JSON | Conservation for each residue in flanking regions. Format: {"<flank position>": {"aa": <residue>, "score": <relative conservation score>}}. |
peptide_residues | JSON | Conservation for each residue in motif consensus. Format: {"<motif position>": {"aa": <residue>, "score": <relative conservation score>}}. |
proviz_link | String | Link to ProViz visualisation tool. |
Function
Approach | Description | Default |
Motif search space correction | Enrichment analysis corrected for motif search space, i.e. search space is limited to disordered regions of proteome. See details. | |
Based on conservation | Enrichment analysis based on conservation scores as the ranking criteria. See details. | |
Classical | Classical enrichment analysis based on hypergeometric distribution. See details. |
Colour | Meaning | |
Green | Significant terms with adjusted p-values < 1e-4. | |
Grey | Enriched terms i.e. with enrichment score (E) > 1. | |
Blue | Depleted terms i.e. with enrichment score (E) < 1. | |
Light yellow | Warning. Term is flagged with repeat or cluster flag. | |
Dark yellow | Warning. Term is flagged with repeat and cluster flag. |
Column | Description | |
Category | Term category. | |
ID | Unique term identifier. | |
Name | Functional annotation name. | |
# | Number of consensus matches that map to this term. | |
# motifs | Number of consensus matches in dataset. | |
# residues | Number of disordered residues* that map to this term. | |
# residues proteome | Number of disordered residues* in whole proteome. | |
# Proteome | Number of proteins in proteome that map to this term. | |
Enrichment | Enrichment (E). If (E) > 1 then term is enriched, otherwise is depleted. | |
P-value | Enrichment significance calculated using Hypergeometric test. Lower scores, more enriched/depleted term. | |
Adj pval | Adjusted p-values. P-values after BH correction. | |
<alignment> P-value | Enrichment significance calculated based on conservation. |
Column | Description | Approach(s) which use a given value in calculations/annotations |
Category | Term category. | |
ID | Unique term identifier. | |
Name | Functional annotation name. | |
No. of motif instances mapped to term in dataset (m) | Number of consensus matches mapped to a given term in dataset (m). | |
No. of motif instances in dataset (M) | Number of consensus matches in dataset (M). | |
No. of disordered residues mapped to term (n) | Number of disordered residues mapped to a given term in proteome (n). | |
No. of disordered residues in proteome (N) | Number of disordered residues in proteome (N). | |
Enrichment | Enrichment score (E). | |
Pvalue | Enrichment significance. | |
Adj pvalue | Corrected p-value for multiple hypothesis testing. | |
No. of proteins mapped to term in dataset | Number of proteins mapped to a given term in dataset. | |
No. of proteins mapped to term in proteome | Number of proteins mapped to a given term in proteome. | |
No. of proteins in dataset | Number of proteins in the dataset. | |
No. of proteins in proteome | Number of proteins in the entire proteome. | |
Enrichment (Proteins) | Enrichment score (E). | |
Pvalue (Proteins) | Enrichment significance. | |
Adj pvalue (Proteins) | Corrected p-value for multiple hypothesis testing. | |
Repeat flag | Warning. Overestimation of term (True/False). | |
Cluster flag | Warning. Overestimation of term (True/False). | |
Repeat flag (expected) | Expected number of instances to be seen by chance mapped to a given a term. | |
Repeat flag (expected p-value) | Significance of repeat flag. | |
Cluster flag (ratio) | Significance of cluster flag. | |
<alignment> | Enrichment significance. |
Field | Type | Description |
category | String | Term category. |
id | String | Unique term identifier. |
name | String | Functional annotation name. |
count | Float | Number of consensus matches mapped to a given term in dataset. |
M | Float | Number of consensus matches in dataset. |
n | Float | Number of disordered residues mapped to a given term in proteome. |
N | Float | Number of disordered residues in proteome. |
enrichment | Float | Enrichment score (E) for motif search space correction approach. |
pval | String | Enrichment significance for motif search space correction. |
pvalBH | String | Adjusted p-value for multiple hypothesis testing for motif search space correction. |
proteinTerm | Float | Number of proteins mapped to a given term in dataset. |
occurrence | Float | Number of proteins mapped to a given term in proteome. |
proteinCount | Float | Number of proteins in the dataset. |
proteinBackgroundCount | Float | Number of proteins in the proteome. |
flag | Boolean | Repeat flag. |
expected | Float | Expected number of instances to be seen by chance mapped to a given a term. |
exp_pval | Float | Significance of repeat flag. |
flag2 | Boolean | Cluster flag. |
countUniMix | Float | Significance of cluster flag. |
url | String | Link to source data. |
Filters
Filter group | Description |
Hub protein | Shared functional annotations and interactors of motif binding-partner. |
Hub domain | Interacting domains. |
Annotation | Subcellular localisation and enriched functional annotations in the dataset. |
Evolution | Taxonomic range. Conservation across different clads/species. |
Accessions | Containing protein and ontology or interacting annotations. |
Accessbility | Accessibility to intracelullar proteins. |
UniProt accession | Protein Name |
P04637 | Cellular tumor antigen p53 |
P11532 | Dystrophin |
Q8WZ42 | Titin |
Source | Accession | Name |
Gene Ontology | GO:0007049 | Cell cycle |
UniProt keyword | KW-0498 | Mitosis |
Pfam domain | PF00017 | SH2 domain |
UniProt protein | P04637 | Cellular tumor antigen p53 (H.sapiens) |
JobID
References
Name | Description | PMID | URL |
UniProt | Protein accessions, names, sequences, families, UniRef clusters and feature annotations. | 25348405 | http://www.uniprot.org |
ELM | Manually curated linear motifs. | 26615199 | http://elm.eu.org |
Pfam | Functional regions and binding domains. | 24288371 | http://pfam.xfam.org |
Phospho.ELM | Experimentally verified phosphorylation sites. | 21062810 | http://phospho.elm.eu.org |
PhosphoSitePlus | Phosphorylation, ubiquitination, acetylation and methylation sites. | 22135298 | http://www.phosphosite.org/homeAction.do |
PDB | Experimentally resolved protein tertiary structures. | 10592235 | http://www.rcsb.org/pdb/home/home.do |
DSSP | Secondary structure derived from PDB tertiary structures. | 25352545 | http://swift.cmbi.ru.nl/gv/dssp/ |
dbSNP | Single-nucleotide polymorphism. | NCBI Handbook [Internet]. Chapter 5. | http://www.ncbi.nlm.nih.gov/SNP |
1000genomes | Single-nucleotide polymorphism. | 23128226 | http://www.1000genomes.org |
switches.ELM | Experimentally validated motif-based molecular switches. | 23550212 | http://switches.elm.eu.org |
Gene Ontology | Gene ontology annotations. | 25428369 | http://geneontology.org |
IntAct | Experimentally validated protein-protein interactions. | 24234451 | http://www.ebi.ac.uk/intact/ |
Name | Description | PMID | URL |
IUPred | Intrinsically disordered regions. | 15769473 | http://iupred.enzim.hu |
SLiMPrints | Conservation of residues across the alignment. | 22977176 | http://bioware.ucd.ie |
Anchor | Binding sites in disordered regions. | 19412530 | http://anchor.enzim.hu |