git
Use set of aligned peptides as input. The peptides must be aligned (i.e. has the same length) and can not contains non-standard amino acids. The peptide alignment can be typed in text box or uploaded from the file and can not contains fully gaped columns. ELM database can be searched to obtain the alignment of ELM motif as input. See details below.
Upon submitting the alignment, the default PSSM for input peptides will be derived and list of options will be displayed. On this page, the user can change the scoring method (see full list here), manually modify PSSM and set the search options.
Set of aligned peptides in fasta or text format derived by user. Can be entered in text box or loaded from file.
Format | Example |
---|---|
Text | AVEQTPRK |
Fasta | >ELMI001940 |
Aligned ELM instances of ELM motif class can be retrived from ELM database by using search box on input page. Start typing the ELM class name and choose the motif from dropdown list.
Option | Default | Description |
---|---|---|
species | Homo sapiens | Species proteome to search for PSSM matches. |
motif | computed | Motif consensus used in conservation scores calculation and taxonomic range analysis. Motif can be manually edited, see details here. |
flank length | 5 | The length of surrounding residues at both sides of motif. Range: 0-20. |
disorder cut-off | 0.4 | Mean IUPred scores of motif residues used to filter structured regions. Range: 0-1. |
p-value cut-off | 0.001 | Used as cut-off for significance of PSSM matches to filter irrelevant matches. Calculated as probability of achieving as good or better score than background model. Range: 0-1. |
proteome order | reverse | Order of proteome scanning to obtain background distribution of scores used to compute PSSM p-value. One of: reverse, shuffle or forward. |
mask cut-off | computed | Mask residues in motif sequences below defined cut-off to highlight relevant amino acids in motif. All residues in sequence with score < cut-off are replaced with ".". Default computed as the lowest score for specific-position residue from the input alignment. |
taxonomic range | motif | Method to define whether peptide from the alignment is conserved. One of: motif, PSSM, both (PSSM + motif). Description of methods is provided here. |
conservation p-value cut-off | 0.001 | Cutoff used in decision whether peptide from the alignment is conserved. Works if taxonomic range is computed based on PSSM method. |
shared annotations | empty | Set of proteins used to obtain significant shared ontology with peptide instances obtained from scanning. The same set of proteins is used to determine whether interaction between obtained results and set of provided proteins exist. |
shared annotations p-value cut-off | empty | Used as cut-off for significance of shared ontology. Calculated as probability of sharing a given term by any two proteins in proteome by chance used to filter common shared annotations. Range: 0-1. |
PSSM is derived with computed motif based on statistical significance and rules described in supplementary material in paper. The motif is represented as regular expression and can be edited by user. The characters allowed to build the regular expression are listed below:
Character | Name | Meaning |
---|---|---|
D ,
E ,
K ,
R ,
H ,
S ,
T ,
C ,
M ,
N ,
Q ,
F ,
Y ,
W ,
G ,
A ,
V ,
L ,
I ,
P
|
residue | One amino acid |
. or X |
dot | Any amino acid allowed |
[...] |
character class | Amino acids listed are allowed |
[^...] |
negated character class | Amino acids listed are not allowed |
{ min, max } |
specified range | Matches min to max repetitions of the previous amino acid. Min required, max allowed |
^ |
caret | Matches the amino terminal |
$ |
dollar | Matches the carboxy terminal |
| |
pipe | Denotes alternation. For example (KL)|(LK), will match either KL or LK. |
() |
brackets | Group items into a single logical item. The bracket indicates the start and end of the group. |
Name | Motif |
---|---|
KEN box motif | KEN |
Cyclin-binding RxL motif | [KR].L.{0,1}[LF] |
C-terminal KDEL Golgi-to-ER retrieving signal | [KRHQSAP][DENQT]EL$ |
N-terminal myristoylation site | ^M{0,1}G[^EDRKHPFYW]..[STAGCN][^P] |
The PSSM can be modify by changing the scoring method (default: PSI-BLAST IC) and/or by manual interaction with PSSM represented as heatmap on the input page. See details below.
PSSM can be computed using different scoring methods. Available scoring schemas with short description are listed below. Full description with equations are provided in Supplementary material of the article.
Scoring method | Description | Reference |
---|---|---|
PSI-BLAST | Modified PSI-BLAST algorithm to motif searches. | 9254694 19088134 11452024 16218944 |
PSI-BLAST IC | Modified PSI-BLAST algorithm and adjusted with information content. | 9254694 19088134 11452024 16218944 |
MOTIPS | Adopted MOTIPS method to derived a PSSM and scoring peptides. | 20459839 |
Log odds | Logarithm base 10 of binomial statistics of enriched versus depleted residues at specific positions. | 24097270 |
Log Relative Binomial | Logarithm base 10 of probability of over or under-representation of residues at specific positions. | - |
Ratio | Amino acids frequencies normalized by background model at specific positions. | - |
In addition to these score, we added "frequency" and "counts" for a visualisation purpose only.
For each species the amino acid composition of the proteome has been computed using different disorder
thresholds. It's possible to choose the frequencies used as background distribution by selecting the species and
the disorder cut-off. Disorder cut-off means that amino acid frequencies were computed from the proteome residues with IUPred
score greater than selected cut-off. By default, "Homo sapiens" and a disorder cut-off of 0.5
are selected.
Additionally, for the PSI-BLAST/PSI-BLAST IC methods, two parameters (i.e. lambda and max independent observation) were precomputed based on background frequencies.
The PSSM visualised as a heatmap is interactive. Hovering with the mouse cursor will display more information, clicking on a tile will modify it. This can be use to further define the motif. 4 actions are possible, listed below, a dropdown menu allows the user to change the action before clicking on a tile. Clicking on the position number, at the top of the heatmap, will nullify (put zeros) for every amino acid at the position.
Multiple modifications can be applied at one position, but only one per amino acid per position. Modifications are displayed on the heatmap itself and the sequence logo above it is updated. Modifications can be reverted by clicking again on a tile or by clicking on the reset button.
Option | Description |
---|---|
require | Requiring a amino acid will fix its score and prohibit the others. Multiple amino acids can be "required" at one position, the relative importance of each of them will be conserved. |
prohibit | A "prohibited" amino acid will take a very low and negative value at this position (equal to -1000). |
group | Grouped amino acids at one position will take the same scored value. 2 ways of grouping are implemented: sum of all the grouped residues scores at the position and the maximum scores of these amino acid at this position. |
zero | The scores are replaced by zeros. |
After a job is finished the main result table will be displayed. The PSSM matches with statistical significance metrics, overlapping feature and motif attribute annotations will be presented in the table on Instances page. Next, the obtained results can be filtered by discriminatory attributes and/or used to run evolutionary and functional enrichment analysis.
The evolutionary analysis allows looks in depth for instances conservation across different species and conservation of motif sequence context (Conservation page).
Functional analysis performs enrichment analysis of functional annotations to indicate possible motif function, localisation or binding partner (Function page).
Filtering can be performed based on different information such as: accessibility, taxonomic range, interacting partners, subcellular localisation and functional annotations (Filters page), or shared annotations with a hub protein (Hub page).
Instance page displays the results obtained from a scanning the proteome with generated PSSM on the input page. The best matches (below PSSM p-value cut-off, maximum 10000) are annotated with peptide, motif attributes and feature information. Depending on the input options and/or applied filters the shared annotations and functional annotations are listed in the result table. Furthermore, the instances are flagged with warnings if occur in the inaccessible regions. Short explanation of each column is presented in table below.
Additionally, each motif instance is linked to ProViz - a visualisation of the motif sequences with protein annotations and alignment tool.
Category | Column | Description | Link |
---|---|---|---|
Peptide | Virus | Virus species. The column is showed only if search was against Viruses proteomes. | - |
Protein Name | Protein and gene name. Information about overlapping instances and warnings. | UniProt | |
Peptide | A motif sequence with the flanks. Flanks are displayed as lowercase residues. Masked sequence appears when
hover over peptide. means that the peptide is an exact match of one of peptides used to build a PSSM. Peptide length is always equal to PSSM length, and in special cases (motif found in N-, C- termini proteins) can be represented as "XX--", where X denotes amino acid and "-" gap character for PSSM length equal 4. |
ProViz | |
Start | Start position of the motif in protein. | - | |
End | Stop position of the motif in protein. | - | |
Motif attributes | PSSM P-value | Statistical significance of peptide according to the background model. Lower score better. PSSM score
and rank are displayed when hover over the p-value. Bolded, green values indicate that peptide obtained better score than the best score from background model. |
- |
Disorder score | Mean IUPred score. High scoring peptides are less likely to be in a globular region. | - | |
Conservation | Conservation score. Lower scores indicate more conserved peptides across the alignment. | ProViz | |
Features | Domain | Overlapping regions with domains. | Pfam and UniProt |
Structure | Overlapping regions that have structure solved by NMR or X-ray crystallography. | PDB | |
Secondary Structure | Overlapping regions that have been shown to form secondary structure. | UniProt | |
Motif | Overlapping regions with experimentally validated short linear motifs. | ELM and UniProt | |
Region | Overlapping regions with experimental evidence for function. | UniProt | |
Switch | Overlapping curated experimentally validated motif-based molecular switches. | UniProt | |
Modification | Overlapping regions with sites of post-translational modifications. | PhosphoSite, phospho.ELM and UniProt | |
Topology | Overlapping region topology information. | UniProt | |
Isoform | Overlapping splice variants. | UniProt | |
Mutagenesis | Overlapping mutated residues which alter function. | UniProt | |
SNP | Overlapping single nucleotide polymorphism with disease association and genotype information. | dbSNP, 1000genomes and UniProt | |
Other | Overlapping other features of interest. | UniProt | |
Functional annotations | GO terms | Gene ontology terms for protein containing peptide. | Gene Ontology |
Keywords | UniProt keywords for protein containing peptide. | UniProt Keywords | |
Interactors: Proteins | Proteins experimentally shown to interact with the protein containing peptide. | UniProt | |
Interactors: Families | Protein families interacting with the protein containing peptide. | UniProt | |
Interactors: Domains | Domains found in proteins which interact with the protein containing peptide. | Pfam | |
Shared annotations | Interaction | Indicate if protein-protein interaction exists between instance and user-defined set of proteins. | - |
Shared ontology | Shared GO terms with user-defined set of proteins. Displayed in three categories: localisation (L), molecular function (F) and biological process (P). Hover to see the least likelihood shared GO term or click to see other shared GO terms. | - |
The instances are provided with several annotations such as motif attributes, feature, hub and function annotations.
Motif attributes annotations include computed scores based on peptide conservation, accessibility and similarity to PSSM. The most important attributes are showed in the result table. Attributes not listed in table can be found in downloaded files. The descriptions of these calculation are listed in the table below.
Motif attribute | Description | Range |
---|---|---|
PSSM P-value | Calculated as a probability that observed value can achieve the same or better score than the background model. Background model is obtained as distribution of scores from scanning the target proteome in reverse, shuffle or forward order. Lower score indicates that the peptide is more similar to the peptides used in creation of PSSM. | 0-1 |
Disorder score | IUPred score computed as mean of IUPred scores across residues of motif consensus. Lower score, more globular region. | 0-1 |
Conservation score | Relative conservation score computed across the defined residues of motif consensus as described for SLiMPrints tool. Lower score, more conserved region. | 0-1 |
Surface accessibility score | Proportion of the peptide that is accessible to water molecules in a solved structure of the region. | 0-1 |
Anchor score | Anchor score computed as mean of Anchor scores across residues of motif consensus. Lower score, higher propensity to fold upon binding. | 0-1 |
Overlapping feature annotations with peptide are grouped into 12 different types. A number in a feature column indicate how many features were found. To see more details, expand feature column by clicking on above column name. To hide these information, click again on the button.
All feature columns can be expanded/collapsed at once by clicking expand/collapse button above feature column names.
Hover over the feature to see start and stop position and more information of each feature. Additionally, each feature is provided with distance information. For example, distance -2 means that annotated feature stops 2 residues before motif start position in the flanking region. No information about distance indicates that annotated feature directly overlaps motif consensus. To see more details about each feature, click on feature of interest and you will be redirected to the source website.
Information about shared annotations is shown in table only if set of proteins was provided on input page in shared annotations section.
Shared annotations are grouped into interaction and ontology group and indicate if interaction exist between the instance and set of proteins and if they share any significant GO term. The shared ontology is provided with a p-value which indicate the significance of sharing this GO term by any two protein in the proteome by chance. The least likely shared GO term appears when hovered over the one of icon: localisation (L), biological process (P) or molecular function (F) and full list of significant shared annotations between instance and defined proteins will display when icon is clicked.
Full list of significant shared annotations for all instances can be seen in hub tab and specific annotations can be used in filtering.
Information about functional annotations is shown in table only if instances were filtered based on these annotations (see Filters section). Annotations are grouped into ontology and interaction annotations (see details). The functional data are obtained from Gene Ontology, UniProt, IntAct and HIPPIE databases.
The instances are flagged to warn user if a given peptide is inaccessible to intracellular proteins. Instances with warnings are shown with yellow background colour in the result table and icon next to protein name. Hover over the icon to get more information about warning details.
Warnings are grouped into two types based on background colour in the result table: domains and other.
Domain warnings means that peptide overlaps region with domain(s) and other warnings are listed in table below.
Warning | Description |
---|---|
Disorder | Instances with disorder score ≥ 0.4. |
Surface accessibility | Instances with surface accessibility percent score < 50% i.e. less than 50% of the peptide is accessible to water molecules in a solved structure of the region. |
Localisation | Instances with Gene Ontology terms which indicate extracellular localisation. |
Topology | Instances overlapping topology features which exclude intracellular regions. |
The instances can be quick filtered based on warnings and hub annotations. To filter results click on above the table in the right corner and switch on/off the slider next to warning/hub type.
The advanced filters options are available on Filter and Hub page.
The results can be saved as tab separated (tdt) or JSON format by clicking on the button in the top left corner of table. Information about ontologies and interactions are not provided in tdt format and hub annotations are limited only to the best terms to reduce the size of file and be easier to read. All information can be easily find in JSON format.
Columns with description are shown in the table below. If a score could not be calculated for motif attribute, then -1 score occur in the file. For feature annotations, each feature in column is separated by ";".
Column | Description |
---|---|
InstanceId | Unique instance identifier. |
ProteinAcc | UniProt protein accession. |
ProteinName | Protein name. |
GeneName | Protein gene name. |
Hit | Motif sequence with flanking regions. Flanks are represented as lowercase residues. |
SeqStart | Motif start position in protein. |
SeqStop | Motif stop position in protein. |
IUPred | Disorder score. |
Anchor | Anchor score. |
SA | Surface accessibility score. |
Conservation <alignment> (score) | Conservation score across <alignment>. |
Conservation <alignment> (var) | Conservation variance across <alignment>. |
Domain | Format: <name>|<id>|<start>|<stop>|<distance> |
Motif | Format: <name>|<id>|<start>|<stop>|<distance> |
Modification | Format: <name>|<enzymes>|<pmids>|<description>|<id>|<start>|<stop>|<distance> |
Structure | Format: <name>|<resolution>|<method>|<chain>|<start>|<stop>|<distance> |
SNP | Format: <name>|<variant>|<id>|<start>|<stop>|<distance> |
Mutagenesis | Format: <name>|<mutation>|<id>|<start>|<stop>|<distance> |
Region | Format: <name>|<start>|<stop>|<distance> |
Topology | Format: <name>|<start>|<stop>|<distance> |
Secondary Structure | Format: <name>|<start>|<stop>|<distance> |
Isoform | Format: <name>|<variant>|<start>|<stop>|<distance> |
Switch | Format: <type>|<subtype>|<mechanism>|<id>|<start>|<stop>|<distance> |
Other | Format: <name>|<start>|<stop>|<distance> |
Warnings | Format: <Warning_type>:<warning details> |
Shared_interaction | Format: <protein_name (protein_acc)> |
Shared_function_terms | Format: <GO term name (id)> shared_with:<protein_name (protein_acc)> |
Shared_function_sig | Probability score |
Shared_function_terms | Format: <GO term name (id)> shared_with:<protein_name (protein_acc)> |
Shared_function_sig | Probability score |
Shared_process_terms | Format: <GO term name (id)> shared_with:<protein_name (protein_acc)> |
Shared_process_sig | Probability score |
Shared_localisation_terms | Format: <GO term name (id)> shared_with:<protein_name (protein_acc)> |
Shared_localisation_sig | Probability score |
Full list of fields for each instance is in the table below. If motif attribute score could not be computed, then -1 score is provided. The feature field(s) are present in results only if there is a one or more feature(s) overlapping motif consensus.
Field | Type | Description |
---|---|---|
instanceId | Integer | Unique instance identifier. |
ProteinAcc | String | UniProt protein accession. |
ProteinName | String | Protein name. |
GeneName | String | Protein gene name. |
Hit | String | Motif sequence with flanking regions. Flanks are represented as lowercase residues. |
SeqStart | Integer | Motif start position in protein. |
SeqStop | Integer | Motif stop position in protein. |
IUPred | Float | Disorder score. |
Anchor | Float | Anchor score. |
SA | Float | Surface accessibility score. |
Conservation <alignment> | Object | Conservation score and variance across <alignment>.
Format: {"score": <float>, "var": <float>}
|
GOterms | List of objects | List of GO terms. Format: [{"id": <GOterm id>, "name": <GOterm name>} |
Keywords | List of objects | List of UniProt keywords. Format: [{"id": <keyword id>, "name": <keyword name>}
|
Interactors | List of objects | List of interacting proteins with protein containing motif. Format: [{"id": <UniProt protein
accession>,
"name": <UniProt protein name (gene name)>}
|
Domain | List of objects | Format: see features |
Motif | List of objects | Format: see features |
Modification | List of objects | Format: feature format, see below.. |
Structure | List of objects | Format: see features |
SNP | List of objects | Format: see features |
Mutagenesis | List of objects | Format: see features |
Region | List of objects | Format: see features |
Topology | List of objects | Format: see features |
SecondaryStructure | List of objects | Format: see features |
Isoform | List of objects | Format: see features |
Switch | List of objects | Format: see features |
Other | List of objects | Format: see features |
Warnings | List of objects | List of warnings. Format: [{"name": <warning category>, "reason": <warning
reason>}] |
shared_ontology | Dictionary |
Field | Type | Description |
---|---|---|
name | String | Feature name. |
url | String | Link to source data. |
description | Object | Format: {"start": <feature start position>, "stop": <feature stop position>,
"distance": <distance
to motif consensus>, "description": <other specific information in JSON format>}
|
There are two main evolutionary sections: flank conservation and taxonomic range for each available clad. Clads depend on the query species and include: QFO (Quest for Orthologs), Arthropoda, Viridiplantae, Amoebozoa, Fungi, Nematoda, Metazoa, Saccharomycetales and Viruses. Each conservation section is provided with general peptide and specific conservation data. Short description of each column is listed in the table below.
To change evolutionary section or clad select one from navigation menu above the table or in sidebar. The custom view of table columns can be set in sidebar. Notice that not all available columns are shown in table by default.
Instances are annotated with the relative conservation scores for each residue in PSSM match and flanking regions (always from -10 to +10 residues). The conservation of each residue is represented with colour intensity i.e. more intense red colour means more conserved residue. Several scores are computed to compare conservation of motif sequence to flanking regions which are listed in the table below.
Instances are annotated with the information about the motif conservation across a species from the alignment. Peptide conservation is estimated based on: motif, PSSM statistics or combined method, which is defined on input page by user (default: motif). Each peptide from the clad alignment is marked as conserved (C), non-conserved (N) or missing in the alignment (X).
Motif method: conservation is defined with use of motif consensus represented as regular expression defined on input page. The peptide from the alignment is marked as conserved if match the motif consensus at the same position as query species motif.
PSSM method: conservation is calculated with use of PSSM scoring method. The peptide at the same position as query species motif from the alignment is scored with the same PSSM as used in search and assigned with PSSM p-value. If the p-value of peptide is less than user-defined cut-off (conservation PSSM p-value cut-off) then the peptide is marked as conserved.
Combined method: the instance is conserved if meet both criteria: consensus match and conservation p-value cut-off.
Usually only the subset of species is default shown in the result table. To see full list of species or customize table columns, use sidebar.
View | Column | Description | Shown by default |
---|---|---|---|
Both | Protein Name | Protein name and gene name. | + |
Peptide | Peptide sequence with flanks. | + | |
Start | Peptide start position in protein. | + | |
Stop | Peptide stop position in protein. | + | |
Disorder score | Motif attribute calculated as mean IUPred score across peptide. | + | |
Domain | Overlapping regions with domains. | + | |
Motif | Overlapping regions with curated short liner motifs. | + | |
Con Score Combined | Conservation score combined. It is sum of conservation score and conservation variation. | + | |
Flank conservation | Sig conserved residues defined positions | Proportion of residues in the defined positions of a motif that are significantly conserved (p > 0.05). | - |
Sig conserved residues Flanks | Proportion of residues in the flanking positions of a motif that are significantly conserved (p > 0.05). | - | |
Sig conserved residues Ratio | The ratio of Sig conserved residues defined positions to Sig conserved residues Flanks. | + | |
L-10:L-1 | Conservation scores and residues for N-terminal flank. | + | |
P<motif position> | Conservation scores and residues for motif consensus. | + | |
R1:R10 | Conservation scores and residues for C-terminal flank. | + | |
Taxonomic range | Conserved Counter | Number of species in which the motif consensus is conserved at the same position as the query species motif. | + |
Species columns | Shows if the peptide from the alignment is conserved (C) or not (N) at the same position as the query species motif in each species of the select clade. If no data is available (i.e. there is no protein in the alignment for the species) an "X" is supplied. | + |
The Options panel is located on the left and there are three sections: Views, Columns and Save.
In Views section a view can be changed. Switch from current view to Taxonomic range or Flank conservation section and specify the alignment. You can do the same from navigation menu above the result table.
In Columns section the columns can be switched on/off to hide/show them in the result table. Here, you can see full list of species available in selected alignment. To add column in the result table, just tick the checkbox . The table will be updated automatically.
In Save section the results can be downloaded as tab separated (tdt) or JSON format. See Download for more details.
The results can be saved as tab separated (tdt) or JSON format. To download results, use sidebar (Save section).
Columns with description are shown in the table below. If a score could not be calculated for motif attribute, then -1 score occur in file.
Column | Description |
---|---|
InstanceId | Unique instance identifier. |
ProteinAcc | UniProt protein accession. |
ProteinName | Protein name with gene name. |
Hit | Motif sequence with flanking regions. Flanks are represented as lowercase residues. |
SeqStart | Motif start position in protein. |
SeqStop | Motif stop position in protein. |
IUPred | Disorder score |
Domain | Domain names separated by ";". |
Motif | Motif classes separated by ";". |
<alignment> conservation score | Conservation score across <alignment>. |
<alignment> conservation var | Conservation variance across <alignment>. |
<alignment> conservation combined | Conservation score combined. Sum of conservation score and variance across <alignment>. |
conserved_counter | Number of conserved species across <alignment>. |
<species> | C, N or X. C - the motif consensus is present at the same position as query species (conserved). N - the motif consensus is missing at the same position as query species (non-conserved). X - species is not present at the alignment (missing). |
mean_flanks | Mean of relative conservation scores across residues of flank regions. |
var_flanks | Variance of relative conservation scores across residues of flank regions. |
Sig conserved residues defined positions | Proportion of residues in the defined positions of a motif that are significantly conserved (p > 0.05). |
Sig conserved residues Flanks | Proportion of residues in the flanking positions of a motif that are significantly conserved (p > 0.05). |
Sig conserved residues Ratio | The ratio of Sig conserved residues defined positions to Sig conserved residues Flanks. |
L-10:L1 | The relative conservation scores for residues in N-termini flank. |
L<position> | The relative conservation scores for residues in motif consensus. |
R1:R10 | The relative conservation scores for residues in C-termini flank. |
Alignment | Hyperlink to ProViz visualisation tool. |
Full list of fields for each instance is in the table below. If motif attribute score could not be computed, then -1 score is provided.
Field | Type | Description |
---|---|---|
instanceId | Integer | Unique instance identifier. |
ProteinAcc | String | UniProt protein accession. |
ProteinName | String | Protein name. |
GeneName | String | Protein gene name. |
Hit | String | Motif sequence with flanking regions. Flanks are represented as lowercase residues. |
SeqStart | Integer | Motif start position in protein. |
SeqStop | Integer | Motif stop position in protein. |
IUPred | Float | Disorder score. |
ConservationScore | Float | Conservation score. |
ConservationVar | Float | Conservation variance. |
ConservationScoreCombined | Float | Conservation score combined. Sum of conservation score and conservation variance. |
Conservation_Scores | List of objects | Conservation score and variance across <alignment>.
Format: [{"<searchdb>": {"score": <float>, "var": <float>}}] .
|
Domain | List of objects | Format: feature format, see features. |
Motif | List of objects | Format: feature format, see features. |
mean_flanks | Float | Mean of relative conservation scores across residues of flank regions. |
var_flanks | Float | Variance of relative conservation scores across residues of flank regions. |
flank_sig | Float | Proportion of residues in the flanking positions of a motif that are significantly conserved (p > 0.05). |
motif_sig | Float | Proportion of residues in the defined positions of a motif that are significantly conserved (p > 0.05). |
ratio_sig | Float | The ratio of Sig conserved residues defined positions to Sig conserved residues Flanks. |
motif_sig_pos | List of Integer | The defined positions of a motif consensus. |
conserved_counter | Integer | Number of conserved species i.e. motif consensus is at the same position as query species. |
Conservation | List of objects | Species conservation. Format: [{"species_code": Boolean}] . True - motif consensus is present
at the same
position, False - motif consensus is missing at the same position.
|
flank_residues | List of objects | Conservation for each residue in flanking regions. Format: [{"<flank position>": {"aa": <residue>,
"score": <relative conservation score>}}] .
|
peptide_residues | List of objects | Conservation for each residue in motif consensus. Format: [{"<motif position>": {"aa": <residue>,
"score": <relative conservation score>}}] .
|
proviz_link | String | Link to ProViz visualisation tool. |
The table contains the results of enrichment analysis of Gene Ontology terms, UniProt keywords and interaction data. Three following approaches are available to performed enrichment analysis: enrichment analysis with motif search space correction where search space is limited to disordered regions of proteome, enrichment analysis based on conservation where conservation scores are used as the ranking criteria and classical enrichment analysis. All approaches accounts for evolutionary relationship of proteins by grouping similar proteins based on sequence and function similarity.
Enrichment analysis with motif search space correction is improvement of classical approach. It accounts for search space, i.e. the analysis is limited to disorder regions of proteome, in the same way like search space for motif searches (disorder cut-off). The enrichment calculations try to answer the question: what is a probability that more than m instances of all motif instances in the dataset (M) belong to a given functional term shared by n of N disordered residues in the entire proteome? Disordered residues are residues with score ≥ than disorder cut-off.
The enrichment analysis uses Hypergeometric test with Benjamini-Hochberg correction to define significance. Details of calculation can be found in SLiMSearch Supplementary Material or on the SLiMSearch help page.
Enrichment analysis based on conservation uses relative conservation scores as ranking criteria. The conservation scores of motif instances are assigned to functional annotations. For each term, the conservation scores assigned to a given term are compared to remaining conservation scores i.e. which are present in results, but not assigned to that term, using the Mann Whitney U test. The functional annotations assigned to more conserved instances will be more related to biological function of motif compared to these functional annotations which are assigned to instances with randomly distributed conservation scores. The enrichment analysis is performed on all available alignments for conservation data. By default, this analysis is not performed. To run them, set Conservation to True in Options section in sidebar and click Search button.
Enrichment analysis based on classical approach use hypergeometric distribution to identify enriched functional annotations and try to answer the question: what is a probability that more than m instances of all instances in the dataset (M) belong to a given functional term compared to a background distribution, where the background distribution is proportion of proteins with a given term to all proteins in the entire proteome.
The enrichment analysis uses Hypergeometric test with Benjamini-Hochberg correction to define significance. Details of calculation can be found in SLiMSearch Supplementary Material or on the SLiMSearch help page.
The correction is applied in each category (i.e. Biological process, Molecular function, Localisation etc.) and it is calculated as: q=(p*i)/n, where p is p-value, n is the number of terms in category and ith term ranked according to the p-value in category.
The enrichment analysis are corrected for evolutionary relationship based on sequence and function similarity. Proteins containing consensus matches can be grouped together based on different UniProt clusters (UniRef50, UniRef90, UniRef100), UniProt protein families or corrected cluster. Corrected cluster combines UniRef50 and UniProt protein families' clusters. There is also options not to cluster the data. The default cluster option is set to UniRef50. These can be change in sidebar in Options section.
Details about calculation with cluster options are described in SLiMSearch Supplementary Material.
Results of enrichment analysis are grouped in two sections: ontology and interaction.
Ontology section is classified into 5 categories: TOP (the most significant 20 terms), biological process, molecular function, localisation and disease.
Interaction section is divided into 3 groups: Domain - domains found in interacting proteins, Family - interacting proteins grouped into protein families, and Protein - interacting proteins.
The terms are highlighted with different colours to make results more readable. The meaning of each colour is listed in table below. Furthermore, results are marked with warnings whether it is possible that a given term is overestimated. The result table can be searched by term name and filtered by enriched, depleted or flagged with warnings terms ( right corner above the result table). The results can be downloaded as tab separated (.tdt) of JSON (.json) file. Short description of each column in result table can be found here.
Colour | Meaning |
---|---|
Green | Significant terms with adjusted p-values < 1e-4. |
Grey | Enriched terms i.e. with enrichment score (E) > 1. |
Blue | Depleted terms i.e. with enrichment score (E) < 1. |
Light yellow | Term is flagged with repeat OR cluster flag. |
Dark yellow | Term is flagged with repeat AND cluster flag. |
Column | Description | Shown by default |
---|---|---|
Category | Term category. | - |
ID | Unique term identifier. | + |
Name | Functional annotation name. | + |
# | Number of motif instances that map to this term. | + |
# motifs | Number of motif instances in dataset. | - |
# residues | Number of disordered residues* that map to this term. | - |
# residues proteome | Number of disordered residues* in whole proteome. | - |
# Proteome | Number of proteins in proteome that map to this term. | + |
Enrichment | Enrichment (E). If (E) > 1 then term is enriched, otherwise is depleted. | + |
P-value | Enrichment significance calculated using Hypergeometric test. Lower scores, more enriched/depleted term. | + |
Adj pval | Adjusted p-values. P-values after BH correction. | + |
<alignment> P-value | Enrichment significance calculated based on conservation. | + |
The functional annotations are flagged to warn user if a given term can be overestimated. The terms with warnings are shown with yellow background in the result table and icon next to adjusted p-value value. To see details about warning, hover over the icon. There are two types of warnings: repeat and cluster flag.
Flag | Name | Description |
---|---|---|
repeat flag | The term can be overestimated when motif instances occur multiple times in the same protein due to repeated regions in that protein. The term is flagged if the number of repeated instances is significantly greater than expected (i.e. p < 0.001). | |
cluster flag | The term can be overestimated when motif instances occur in related proteins, but were not clustered based on evolutionary relationship. This flag is only calculated when enrichment analysis is performed on UniRef50 clusters. The term is flagged if the ratio of number of corrected clusters assigned to a term to UniRef50 clusters assigned to this term is ≤ 0.5. |
See Filters.
The Options panel is located on the left.
Section | Description | |
---|---|---|
Views | A view can be changed to see enriched terms from selected category. | |
Options | Cluster | UniProt Reference Cluster (UniRef) utilised for the analysis. Default: UniProt50 |
Conservation | Compute enrichment significance based on conservation scores. Default: false | |
P-value | P-value cut-off limits number of returned hits. Default: 0.01 | |
Columns | The columns can be switched on/off by ticking the checkbox next to column name. The table will be updated automatically. | |
Save | The results can be downloaded as tab separated format (tdt) or JSON format |
Column | Description | Approach |
---|---|---|
Category | Term category. | all |
ID | Unique term identifier. | all |
Name | Functional annotation name. | all |
No. of motif instances mapped to term in dataset (m) | Number of motif instances mapped to a given term in dataset (m). | search space correction |
No. of motif instances in dataset (M) | Number of motif instances in dataset (M). | search space correction |
No. of disordered residues mapped to term (n) | Number of disordered residues mapped to a given term in proteome (n). | search space correction |
No. of disordered residues in proteome (N) | Number of disordered residues in proteome (N). | search space correction |
Enrichment | Enrichment score (E). | search space correction |
Pvalue | Enrichment significance. | search space correction |
Adj pvalue | Corrected p-value for multiple hypothesis testing. | search space correction |
No. of proteins mapped to term in dataset | Number of proteins mapped to a given term in dataset. | classical |
No. of proteins mapped to term in proteome | Number of proteins mapped to a given term in proteome. | classical |
No. of proteins in dataset | Number of proteins in the dataset. | classical |
No. of proteins in proteome | Number of proteins in the entire proteome. | classical |
Enrichment (Proteins) | Enrichment score (E). | classical |
Pvalue (Proteins) | Enrichment significance. | classical |
Adj pvalue (Proteins) | Corrected p-value for multiple hypothesis testing. | classical |
Repeat flag | Warning. Overestimation of term (True/False). | all |
Cluster flag | Warning. Overestimation of term (True/False). | all |
Repeat flag (expected) | Expected number of instances to be seen by chance mapped to a given a term. | all |
Repeat flag (expected p-value) | Significance of repeat flag. | all |
Cluster flag (ratio) | Significance of cluster flag. | all |
<alignment> | Enrichment significance. | conservation |
Field | Type | Description |
---|---|---|
category | String | Term category. |
id | String | Unique term identifier. |
name | String | Functional annotation name. |
count | Float | Number of motif instances mapped to a given term in dataset. |
M | Float | Number of motif instances in dataset. |
n | Float | Number of disordered residues mapped to a given term in proteome. |
N | Float | Number of disordered residues in proteome. |
enrichment | Float | Enrichment score (E) for motif search space correction approach. |
pval | String | Enrichment significance for motif search space correction. |
pvalBH | String | Adjusted p-value for multiple hypothesis testing for motif search space correction. |
proteinTerm | Float | Number of proteins mapped to a given term in dataset. |
occurrence | Float | Number of proteins mapped to a given term in proteome. |
proteinCount | Float | Number of proteins in the dataset. |
proteinBackgroundCount | Float | Number of proteins in the proteome. |
flag | Boolean | Repeat flag. |
expected | Float | Expected number of instances to be seen by chance mapped to a given a term. |
exp_pval | Float | Significance of repeat flag. |
flag2 | Boolean | Cluster flag. |
countUniMix | Float | Significance of cluster flag. |
url | String | Link to source data. |
This functionality is optional and is available when the hub protein was specified on input page - shared annotations in advanced options section.
Shared annotations with a hub protein are grouped into two sections: Ontology and Interaction.
Shared functional annotations with motif partner(s) provided on the input page are displayed in the table.
GO terms with p-value equal or below user-defined cut-off are annotated with significance metrics and interacting proteins/instances. These GO terms can be used to filter motif instances.
By default, number of interactors, proteins containing motif and motif instances are listed in the table, and details i.e. protein names can be seen by clicking on expand button above column name.
The GO terms can be filtered based on categories by clicking on icon above the right table corner. Just switch on/off the slider next to the category of interest and the table will be updated automatically.
Specific GO terms can be searched in the table by using search box above the table. To search a term, start typing the name in the provided box and the table will be updated automatically.
Short descriptions of table columns are listed below.
To filter motif instances based on shared functional annotations, tick the checkbox next to terms () and click Filter button to filter instances and be redirected to Instance page, or click Add button to save the filter.
Name | Description |
---|---|
Category | GO term category. |
Name | Name of GO term with link to a source. |
Prob (proteins) | Probability of sharing a term by any two proteins in the proteome by chance. Used as significance cut-off. |
Prob (UniRef50) | Probability of sharing a term by any two proteins clustered based on UniRef50 clusters in the proteome by chance. |
Interactors | Interacting proteins (defined by user on the input page) that shared a specific term. |
Proteins | Protein containing motifs that shared a specific term. |
Instances | Motif instances that shared a specific term. |
Displays the list of proteins containing motif which are known interactors of motif partner(s) defined by user on the input page.
Short descriptions of table columns are listed below.
Name | Description |
---|---|
Name | Protein containing motif(s) which interact with set of provided proteins on the input page. |
Interactors | Interacting proteins (i.e. defined by user on input page) with protein containing motif. |
Instances | Motif instances which interact with set of provided proteins on the input page. |
Motif instances can be filtered based on motif attributes, protein-containing, ontology or hub annotations. The filters are grouped into following categories:
Filter | Description |
---|---|
Hub protein | Shared functional annotations and interactors of motif binding-partner. |
Hub domain | Interacting domains. |
Annotation | Subcellular localisation and enriched functional annotations in the dataset. |
Evolution | Taxonomic range. Conservation across different clads/species. |
Accessions | Containing protein and ontology or interacting annotations. |
Accessibility | Accessibility to intracellular proteins. |
Shared annotations | Shared ontology and interaction with set of proteins predefined by user. |
Important! After filtering, the results on Instance, Conservation and Function page will be shown only for instances that meet that criteria. For example, functional enrichment analysis will be recalculated for new motif dataset and conservation information will be limited to instances after filtering.
To filter instances, choose one of filtering options from navigation menu and follow the instruction provided on
page. All filtering sections have Description header with [-]
or [+]
sign. The short
description of filtering can be expanded or collapsed by clicking on these signs. After specifying your filters
click Filter button and you will be redirected to Instances page to view filtered instances. You can save your
filter by clicking on Add button. The filter will be added to all filters and you can specify another filtering
options.
The instances can be filtered by multiple filters. To specify your filters, use Add button or filter instances with Filter button and come back to another filter and click again Filter button. The next filter will be added and two filters will be used.
To display details about each filter, click on the "details" button next to the filter name. Each filter can be removed by clicking on the icon next to filter name. After removing filters, click on the "UPDATE" button and the motif instances will be filtered with updated filters. If you remove all filters, the results will be updated automatically. If you do not want to make any changes in filters, click on "Instances" tab from navigation to see instances that meet multiple criteria.
There are two options to filter instances based on annotations of known motif binding partner. The instances can be filtered based on shared functional annotations or presence as interactors for binding partner.
The functional annotations of binding partner are provided with information how likely a given term is shared by any two proteins in the proteome. The probabilities are computed based on UniRef50 clusters (Sig (UniRef50) column) and without any clustering (Sig (NoClustering) column). The user can limit the number of returned functional annotations to these more specific by setting cut-off to lower score. The lower cut-off excludes general annotations such as: metabolic process, single-organism process or localisation. The cut-off can be chosen from one of provided: 1e-5 , 0.01 or 0.1 or be defined by user. To define your own cut-off, enter your cut-off in empty box next to cut-offs and press ENTER. The table will be updated. The functional annotations are derived from Gene Ontology project.
The functional annotations of binding partner can be used to filter instances in dataset to limit instances to these which share the same function with binding partner.
Step 1. The binding partner can be provided as UniProt protein accession or protein name. If you start typing the protein name, the list of possible proteins will show below the box. Click on protein of your choice from the list to define the binding partner.
Interacting proteins of binding partner are provided with information how likely a given interactor is shared by any two proteins in the proteome. The probabilities are computed based on UniRef50 clusters (Sig (UniRef50) column) and without any clustering (Sig (NoClustering) column). The user can limit the number of returned interactors to these more specific by setting cut-off to lower score. The cut-off can be chosen from one of provided: 1e-5, 0.01 or 0.1 or be defined by user. To define your own cut-off, enter your cut-off in empty box next to cut-offs and press ENTER. The table will be updated.
The interactors of binding partner can be used to limit instances to these which occur in interacting proteins of binding partner and interact with the binding partner.
Step 1. The binding partner can be provided as UniProt protein accession or protein name. If you start typing the protein name, the list of possible proteins will show below the box. Click on protein of your choice from the list to define the binding partner.
Interacting domains can be used to limit instances to these which could interact with specified binding domains, i.e. domains which occur in known interacting proteins for a given instance.
Step 1. Search for domains. Start typing domain name or shortcode in the input box and possible domains will show in the table below. The table will be updated automatically whenever you start typing in the input box.
The instances can be filtered based on possible localisation or ontology and interaction annotations for motif set.
The protein cellular component annotations can be used to limit number of instances occurring in (or outside) specific localisations. All possible subcellular localisations for motif instances in dataset are listed in table and each annotation is provided with information how many instances occur in specific cellular component (# column).
The table can be searched by using search box above the table. Start typing the localisation and the table will be updated automatically.
The ontology and interaction terms can be used to filter the motif instances which are assigned to selected terms by user. All possible functional annotations for instances in motif set are listed in the tables. Each term is provided with enrichment scores and significance from enrichment analysis.
The same filtering can be performed when you are on Function page.
Taxonomic range can be used to limit number of instances to these conserved outside (or inside) specific clad or species. All possible clads (and species) are listed in the table.
The table with taxonomic range can be searched by using search box above the table. Start typing the species or clad name and the table will be updated automatically.
There are two options to filter consensus matches based on provided accessions. The instances can be filtered by protein accessions or ontology and interaction identifiers.
UniProt protein accessions can be used to limit number of instances to these occurring in specific proteins.
UniProt protein accessions are 6-10 alphanumerical stable identifiers. Examples of UniProt protein accessions in human:
UniProt accession | Protein Name |
---|---|
P04637 |
Cellular tumor antigen p53 |
P11532 |
Dystrophin |
Q8WZ42 |
Titin |
Several annotations such as: Gene Ontology, UniProt Keywords, Pfam and UniProt accessions can be used to limit number of motif instances in dataset. Gene Ontology and UniProt Keywords filter instances by function and localisation, and Pfam identifiers and UniProt accessions filter instances based on interaction data. Provided UniProt accession(s) indicate binding partner(s) i.e. interacting proteins, and Pfam id(s) describe interacting domain(s) i.e. domains occurring in interacting proteins.
Examples of accessions:
Source | Accession | Name |
---|---|---|
Gene Ontology | GO:0007049 |
Cell cycle |
UniProt keyword | KW-0498 |
Mitosis |
Pfam domain | PF00017 |
SH2 domain |
UniProt protein | P04637 |
Cellular tumor antigen p53 (H.sapiens) |
Accessibility information can be used to limit number of instances to these which are accessible to intracellular proteins. Motif instances are provided with warnings which indicate that instance is inaccessible to intracellular proteins. These warnings can be used to filter instances.
The same filtering can be done on Instance page by clicking on above the right table corner.
Shared annotations data provided by user on input page can be used to filter motif instances. The results can be limited to these instances which are known to interact with defined set of proteins, or these instances which share at least one GO term with defined set of proteins.
The same filtering can be done on Instance page by clicking on above the right table corner.
The user can retrieve previous searches using a unique identifier - JobID. JobID can be found on Instances page in the right top corner. Jobs are stored for two weeks.
name | description | PMID | URL |
---|---|---|---|
UniProt | Protein accessions, names, sequences, families, UniRef clusters and feature annotations. | 25348405 | http://www.uniprot.org |
ELM | Manually curated linear motifs. | 26615199 | http://elm.eu.org |
Pfam | Functional regions and binding domains. | 24288371 | http://pfam.xfam.org |
Phospho.ELM | Experimentally verified phosphorylation sites. | 21062810 | http://phospho.elm.eu.org |
PhosphoSitePlus | Phosphorylation, ubiquitination, acetylation and methylation sites. | 22135298 | http://www.phosphosite.org/homeAction.do |
PDB | Experimentally resolved protein tertiary structures. | 10592235 | http://www.rcsb.org/pdb/home/home.do |
DSSP | Secondary structure derived from PDB tertiary structures. | 25352545 | http://swift.cmbi.ru.nl/gv/dssp/ |
dbSNP | Single-nucleotide polymorphism. | NCBI Handbook [Internet]. Chapter 5. | http://www.ncbi.nlm.nih.gov/SNP |
1000genomes | Single-nucleotide polymorphism. | 23128226 | http://www.1000genomes.org |
switches.ELM | Experimentally validated motif-based molecular switches. | 23550212 | http://switches.elm.eu.org |
Gene Ontology | Gene ontology annotations. | 25428369 | http://geneontology.org |
IntAct | Experimentally validated protein-protein interactions. | 24234451 | http://www.ebi.ac.uk/intact/ |
HIPPIE | Validated human protein-protein interactions. | 27794551 | http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/ |
name | description | PMID | URL |
---|---|---|---|
IUPred | Intrinsically disordered regions. | 15769473 | http://iupred.enzim.hu |
SLiMPrints | Conservation of residues across the alignment. | 22977176 | http://bioware.ucd.ie |
Anchor | Binding sites in disordered regions. | 19412530 | http://anchor.enzim.hu |