Getting started


Select the task of interest

Search the available articles for terms of interest, browse articles linked to motifs from the ELM database, or classify/annotate an unknown article by its PubMed ID.

Specify the input and run

Depending on the task, specify either a term of interest, an ELM motif or a PubMed ID and then submit the job.

Work on the results

Depending on the task, inspect the collection of retrieved articles, curate the annotations or add new data to the database.

Classes Index


Browse classified SLiM-related articles by ELM class.

Browse classes

The articles.ELM resource allows all articles in the articles.ELM literature dataset to be browsed by class, providing an index of the classes and their description. The articles linked to each class can be accessed through the Browse view.

Classes list

A collection of classes from the ELM database are presented in alphabetical order. Each class is presented along with its name, a short description and a link to browse all articles linked to the class in articles.ELM, as in the example below:

Example of Classes entry

Figure. Example of Classes entry

Clicking on the class identifier (e.g. "CLV_C14_Caspase3-7") takes the user to the corresponding entry in the ELM database.

The Class Name column provides the functional site class name and description. Hovering on the information icon ( Classes info ) displays the name of the binding partner domain, the motif consensus pattern and its associated probability. All data is taken directly from ELM.

A dedicated button is available to browse all articles for the selected motif class:

Articles from class
Figure. Articles from class

Search Articles


Search the database of functional SLiM-related articles.

Search queries

The term of interest should be typed in the text box. The query can be any word or phrase longer than one character. If the query comprises two or more words, a drop-down menu allows boolean searches, i.e. to specify whether to match the whole phrase as written (“Exact”), all of the words regardless of their location in the text (“AND”), or just any one of them (“OR”). Please note that regular expression searches are not supported.

Sample queries
intention correct syntax correct match
Retrieve articles including the phrase “Nuclear Export Signal” Nuclear Export Signal exact
Retrieve articles including one or both of the terms “GLEBS” and “Bub3” GLEBS Bub3 OR
Retrieve articles including both of the terms “NLS” and “SV40” NLS SV40 AND
Retrieve articles including the UniProt accession “P06400” P06400 exact


Search results

The output of the search is a collection of entries that match the search phrase. The total number of retrieved articles is shown on top. Matches are made on the text of the title or abstract of an article. Matches can also be found on the names of genes or proteins associated with the articles. A sliding button to the right of the number of results discards those articles where the title and the abstract were not matched:

Filter by abstract/title matches
Figure. Filter by abstract/title matches

A list of the retrieved articles is given, ordered by increasing date of publication. Extended information is presented for each article. Clicking on the article title provides direct accession to the classification/annotation interface. Clicking on the icon to the right of the article title (External link icon) opens its PubMed entry in a new window.

Every article from the ELM database that was used in the training dataset is assigned a gray label to the left of its title:

Search - ELM
Figure. Search - ELM

Every article that has been submitted and manually curated for a given motif is assigned a specific label to the left of its title:

Search - Candidate
Figure. Articles from class

There may be articles in the results list which have not been curated before. This provides a good opportunity to recognize relevant articles that may have slipped the experts’ attention. Every article that has not been curated before receives a specific label to the left of its title, which can be clicked for direct accession to the classification/annotation interface:

Search - Submit
Figure. Search - Submit

A link at the bottom of the page allows the user to download all search results in tab-delimited text format for easy computational parsing and compatibility with commonly used spreadsheet applications:

Download as TDT
Figure. Download as TDT

Each article entry provides details on the match criteria and the motifs that have been related to it, either by manual or automatic curation.

Match Details

Presents the match criteria for every term included in the search phrase. Individual matches can be made on one or more of the following fields: “title”, “abstract_text”, “protein_names”, “gene_names”, “ELM”, “interaction”, “pdb”.

Curated motifs

Presents a list of all ELM motifs that have been manually assigned to a specific article by expert curation.

Classified motifs

Presents a list of all ELM motifs that have been automatically linked to a specific article by our classifier tool.

Classification confidence

A four-level ranking system was developed to mark the confidence in the prediction. A certain star symbol is presented to the left of the predicted motif, depending on the confidence of the prediction. Each level is defined by the probability of the match, its distance (to the classifier hyperplane) and the difference in distances or Δdistance (based on the difference between its distance and the cumulative probability). The specific values for these three measures are shown as a pop-up box by hovering over the assigned star.

symbol confidence probability cutoff distance cutoff Δdistance cutoff
Star - low confidence low <0.05 < 0 < 0
Star - medium confidence medium <0.05 < 0 < 0.5
Star - high confidence high <0.05 < 0 > 0.5
Star - highest confidence highest <0.05 > 0 -


Delta distance

A random article decision function distance distribution is calculated for each class in the classifier by calculating the distance of each article in the training set, excluding the articles that are members of the tested class, against a given class classifier. The assumption is that these articles are motif articles which are not describing the given class and therefore will provide a conservative representation of the likelihood of seeing a given distance by chance. When classifying a new article, all class distances are returned and the closest class is defined as the most likely classification for the article.

Distance

The classifier allows the class of a document to be distinguished by calculating a metric analogous to a similarity to a set of abstracts. The article similarity is quantified as a distance from the hyperplane for the class. (This value is actually a displacement, so it can have negative values). This distance is related to the similarity of the input article to the set of articles of a given class.

Probability (p)

The article decision function distance distribution is converted to a cumulative probability and applied to each article during classification to provide an intuitive probabilistic classification metric to complement the more abstract decision function distances. The publications describing Protein-Protein interaction from the HIPPIE database can also be used to calculate the background distance distributions for each class.

Browse Articles


Browse SLiM-related articles classified by ELM and articles.ELM manual curation and text-mining.

Browse entries

A drop-down menu allows to select a certain motif from the ELM database. This will automatically show all articles from PubMed that were annotated with a motif class, with the source of the classification being the ELM consortium, the articles.ELM team or contributors and/or the articles.ELM text-mining classification tool. An index of the classes and their description is available in the Classes view.

Entry details

Each ELM class has an entry that starts with a Word Cloud. This represents the weights of the terms for the article classifier of the given class: the bigger the text, the more relevant for the classifier to identify a particular class. For example, in the case of the DEG_MDM2_SWIB_1 motif, the following word cloud shows that mdm2 is the most relevant term in the associated literature, followed closely by others like p53 and p63 (its binding partners):

Browse - Word cloud example

Figure. Browse - Word cloud example

Below the Word Cloud is a collection of articles related with the motif of interest. These are ordered by decreasing relevance and presented along with the available sources of annotation.

Extended information is presented for each article, as in the example below:

Browse - Article example

Figure. Browse - Article example

Clicking on the article title provides direct accession to the classification/submission interface. Clicking on the icon to the right of the article title (External link icon) opens its PubMed entry in a new window.

On the Sources column, every article for which a certain motif class has been assigned by the ELM resource, is given a gray ELM label (the question mark indicates that an extended description is available by hovering the label):

Browse - ELM
Figure. Browse - ELM

Every article that has been manually submitted and manually curated in the Candidates section of articles.ELM is assigned a blue Candidate label:

Browse - Candidate
Figure. Browse - Candidate

Similarly, every article that is not yet curated but was automatically classified by the articles.ELM text-mining tool receives a yellow Classified label.

Browse - Classified
Figure. Browse - Classified

This provides a good opportunity to recognize relevant articles that may have slipped the experts’ attention so far. Clicking on the label provides direct accession to the classification interface.

A four-level ranking system was developed to mark the confidence in the prediction. A certain star symbol is presented to the left of the article title, depending on the confidence of the prediction (see Classification confidence for more details). Articles are presented by decreasing significance, i.e. higher probability and lower distance or Δdistance articles are presented at the top of the list.

Like in other pages, a button at the bottom allows the user to download all entries in tab-delimited text format for easy computational parsing and compatibility with commonly used spreadsheet applications:

Download as TDT
Figure. Download as TDT

Candidate Articles


Browse user-submitted SLiM-related articles and classifications.

Submit candidate entries

Before the list of candidate articles, a text box is available to submit a PubMed ID as a candidate entry. The user must enter a valid PubMed identifier (PMID) that corresponds to an existing PubMed record. The PMID must be a positive integer value with 6 or more digits; for example, 8610146. Upon clicking the submit button the user is taken to the classification/submission interface.

Filter candidate entries

A drop-down menu is available to filter the list of candidate articles by ELM classes or custom groups (e.g. "Cell Cycle Motifs"). When a filter is applied, the number of satisfying entries is updated, and a Reset link is added to remove the filters if needed. Only one filter can be applied at a time. Each filtering option in the menu is shown with the number of existing candidates of the class in parenthesis.

Candidate entry details

The output of the search is the extended information available for the matching article, as in the example below:

Candidates - Article example

Figure. Candidates - Article example

Clicking on the article title provides direct accession to the classification/submission interface. Clicking on the icon to the right of the article title (External link icon) opens its PubMed entry in a new window.

On the Classification column, the ELM class that has been associated with this article is shown. Confidence level on the curation is indicated using the star-based system (see Classification confidence for more details). Clicking on the classified name (e.g. "DOC_MAPK") filters the list of candidates that were assigned the same classification. Note that articles could be candidates for more than one class, including some which have already been curated in the ELM dataset, highlighted with a tick mark:

Candidates - Multi-class article example

Figure. Candidates - Multi-class article example

Hovering over the person (Icon - curator) or group (Icon - curator group) icons shows the name of the curator(s) or consortium who annotated the class for the given article.

Like in other pages, a button at the bottom allows the user to download all entries in tab-delimited text format:

Download as TDT
Figure. Download as TDT

Classified Articles


Browse SLiM-related articles classified by the articles.ELM text-mining classification tool.

Classified entries

A drop-down menu allows the user to select a certain motif from the ELM database. This will automatically show all articles from PubMed that were manually collected and linked to this motif by the classifier tool, based on its previous training on curated articles from our database.

Classified entry details

Each ELM class has an entry that starts with a Word Cloud. This represents the weights of the terms for the article classifier of the given class: the bigger the text, the more relevant for the classifier to identify a particular class. For example, in the case of the DEG_MDM2_SWIB_1 motif, the following word cloud shows that mdm2 is the most relevant term in the associated literature, followed closely by others like p53 and p63 (its binding partners):

Classified - Word cloud example

Figure. Classified - Word cloud example

Below the Word Cloud is a collection of articles related with the motif of interest. The total numbers of training and retrieved articles are shown on top, along with direct links to motif-specific ELM class pages, benchmarking results and candidate articles.

A dedicated button allows showing/hiding those articles that were part of the training dataset:

Show training articles Hide training articles
Figure. Hide/Show training articles

Then, a list of the retrieved articles is given, presenting first all articles used for training, and ordering results by decreasing significance:

Classified - Article example

Figure. Classified - Article example

Extended information is presented for each article, as in the schema below:

Classified - Article schema

Figure. Classified - Article schema

A four-level ranking system was developed to mark the confidence in the prediction. Confidence level on the curation is indicated using this star-based system (see Classification confidence for more details).

Every article from the ELM resource that has been part of the training set is assigned a specific label to the left of its title:

Classified - ELM
Figure. Classified - ELM

Every article that has been manually submitted and manually curated in the Candidates section is assigned a specific label to the left of its title:

Classified - Curated candidate
Figure. Classified - Curated candidate

A similar label recognizes articles that are already classified in articles.ELM but for a different class:

Classified - Different class curated candidate
Figure. Classified - Different class curated candidate"

There may be articles in the results list which have not been part of the training set. This provides a good opportunity to recognize relevant articles that may have slipped the experts’ attention. Every article that has not been curated before receives a specific label to the left of its title, which can be clicked for direct accession to the classification interface:

Classified - Submit
Figure. Classified - Submit

Clicking on the article title provides direct accession to the classification/submission interface. Clicking on the icon to the right of the article title (External link icon) opens its PubMed entry in a new window.

The abstract of each classified article for the ELM class is colour-coded by word weighting in the same palette as the word cloud. Hovering over a coloured word shows a pop-up with the weight assigned to the term by the classifier tool. Clicking on this word opens the search interface and queries the database with the selected term.

Classify/Submit an Article


Classify an article relative to curated functional IDR-related articles, or contribute new annotations about relevant protein information.

Classification input

The input for classification is any published article indexed in PubMed. The user must enter a valid PubMed identifier (PMID) that corresponds to an existing PubMed record. The PMID must be a positive integer value with 6 or more digits; for example, 8610146.

Classification results

The output of the search is the extended information available for the matching article, as in the example below:

Classify/Submit - New Article

Figure. Classify/Submit - New Article

The results are displayed according to the following schema:

Classify/Submit - Article schema
Figure. Classify/Submit - Article schema

Clicking on the icon to the right of the article title (External link icon) opens its PubMed entry in a new window.

MeSH

The classifier lists all MeSH (Medical Subject Headings) terms associated with the article.

Curation

Below the MeSH terms is the Curation sub-section. It lists all ELM motif classes that have been annotated and manually curated for the article. This section is only displayed if any curation has already been conducted.

The ELM class that has been associated with this article is shown. Confidence level on the curation is indicated using this star-based system (see Classification confidence for more details). Clicking on the class name takes the user to the Browse interface in order to facilitate direct retrieval of other articles in the same class. Since the article has already been manually curated it receives a specific label to the right of the motif:

Classify/Submit - articles.ELM curated example
Figure. Classify/Submit - articles.ELM curated example

Hovering over the person icon (Icon - curator) shows the name of the curator who annotated the ELM class for the article.

articles.ELM Classifier Classifications

Below the MeSH terms or Curation sub-sections is the Article Classification sub-section. It lists all ELM motif classes that have been linked to the article by the articles.ELM classifier.

As in the manual Curation section above, the ELM class that has been associated with this article by the articles.ELM classifier is shown. Confidence level on the curation is indicated using this star-based system (see Classification confidence for more details). Clicking on the class name takes the user to the Browse interface in order to facilitate direct retrieval of other articles in the same class. If the article has already been manually curated it receives a specific label to the right of the motif:

Classify/Submit - articles.ELM curated example
Figure. Classify/Submit - articles.ELM curated example

A dedicated button is provided to highlight the MeSH terms on the abstract:

Classify/Submit - show terms
Figure. Classify/Submit - show terms

Submission input

Below the Classification results is special section that allows the user to submit the article for addition as a curated functional IDR-related article.

Users should not expect to see the article immediate available on the website: all submissions will be curated by the articles.ELM team before they appear in the resource.

A drop-down menu provides the user with the (non-mandatory) option to recognize the classified ELM class as valid, propose one of the other available ELM classes instead or to select one of the Curators Group for reviewing. After selecting the right choice, click the ‘Submit’ button on the right.

Submit output

There is no evident output from the Submit option, apart from a message confirming successful submission displayed below the drop-down input menu. The user should check the website periodically to find out if the submitted annotation has been included already.

Benchmarks


Explore the results of the benchmarking for the resource.

Benchmark selection

A drop-down menu allows to select and explore several benchmark analyses on the complete set or an individual class of ELM motifs. There are four different benchmarking analysis provided:

benchmark description
Textmining Classification Benchmarking - Cross Validation A 5-fold cross validation benchmark protocol to assess the ability of the articles.ELM classifier to correctly classify motif articles of the ELM dataset.
Textmining Classification Benchmarking - Manually Curated Datasets Benchmarking based on ten sets of manually curated motif articles, which aims to determine the ability of the classifier to identify the correct ELM class of large real-world curated datasets. The articles in each set describe an instance of a motif class present in the ELM resources where the article itself was not annotated in the ELM resources.
Textmining Classification Benchmarking - Keywords Benchmarking based on the motif classifier term weightings for each class, which were investigated to understand their relationship to the motif class they describe.
Annotation Benchmarking - ELM Dataset Reannotation Benchmarking based on the reannotation of ELM curated proteins for the ELM dataset, aiming to quantify the ability of each source of protein metadata to programmatically identify the relevant protein metadata for a given article.

Cross Validation benchmark results

Results of the benchmarking are available for the complete set or an individual motif class. A brief description of the analysis is followed by the ROC curve section, which shows the values obtained for a selection of performance metrics (defined at the bottom of the page):


metric description
AUC Area Under the Receiver-Operating Characteristic (ROC) curve. Measures the ability to recognize elements of two different classes at varying thresholds. Higher AUCs correspond to better models.
Recall Also known as TPR (True Positive Rate) or Sensitivity. Measures the proportion of positive observations that were successfully recognized as such.
False Positive Rate Measures the proportion of negative observations that were wrongly recognized as positives.
Relevant Articles Curated articles tested in the analysis
Significant Relevant Articles Curated articles tested in the analysis that were classed to the correct class at a probability cut-off of 0.05
Significant Background Articles Curated articles tested in the analysis that were classed to the incorrect class at a probability cut-off of 0.05
Tested Articles-Class pairs The number of independent classifications that have been performed in the analysis
Relevant mean p value The mean probability score for the curated articles for the correct class
Relevant max p value The max probability score for the curated articles for the correct class
Background mean p value The mean probability score for the curated articles for the incorrect class
Background max p value The max probability score for the curated articles for the incorrect class


Receiver Operating Characteristic (ROC) curve

The Receiver Operating Characteristic (ROC) curve is displayed next to the metric values. It shows the relationship between the False Positive Rate (the proportion of real negative classifications that were labelled as positive) and the True Positive Rate (the proportion of real positive classifications that were correctly labelled as such), at various predefined threshold settings, for the selected benchmarking set:

Benchmark - Cross-Validation - ROC curve

Figure. Benchmark - Cross-Validation - ROC curve

Class data benchmark table

When the "Complete benchmarking set" option is selected, a table is included after the performance metrics which lists the values of the same metrics for the complete set and for each individual ELM motif class.

Manual curation benchmark results

A brief description of the analysis is followed by a bar chart presenting the obtained results.

Manual curation classification chart

A bar chart indicates values from a selection of performance metrics: Recall, Recall Top Ranked, Precision and Recall Alternative (defined below).

Benchmark - Manual - Bar chart

Figure. Benchmark - Manual - Bar chart

Values were calculated for each of ten selected datasets (note that certain datasets can have more than one classification):

dataset motif class(es) involved
NES TRG_NES_CRM1_1
NLS TRG_NLS_Bipartite_1, TRG_NLS_MonoCore_2, TRG_NLS_MonoExtC_3 and TRG_NLS_MonoExtN_4
PxIxIT DOC_PP2B_PxIxI_1
LxVP DOC_PP2B_LxvP_1
D box DEG_APCC_DBOX_1
KEN DEG_APCC_KENBOX_2
RVxF DOC_PP1_RVXF_1
LC8 LIG_Dynein_DLC8_1
SH2 LIG_SH2_GRB2, LIG_SH2_PTP2, LIG_SH2_SRC, LIG_SH2_STAT3, LIG_SH2_STAT5, LIG_SH2_STAT6
WW LIG_WW_1, LIG_WW_2, LIG_WW_3


Manual curation data benchmark table

For each dataset, the values of the aforementioned and other metrics are tabulated below the chart, followed by the definitions of the metrics themselves:

metric description
Articles Count Number of curated articles for the selected motif.
Predicted Correctly Number of articles that have been correctly assigned to the same curated class by the articles.ELM classifier.
Recall The proportion of articles correctly classified.
Predicted Top Number of articles that have been correctly classified as top rank.
Recall Top The proportion of articles correctly classified as top rank.
Predictions Number of classes that that were predicted for the articles in the dataset.
Predictions TP Number of classes that that were correctly predicted for the articles in the dataset. (Note that this value can be greater than the number of correctly predicted articles, as greater than one classifications can be correct for a dataset.)
Precision The proportion of classified articles that were classified correctly.
Predicted Alternative Number of articles that have been correctly assigned to an alternative curated class. (Note that motif classes generally co-occur in the same protein, or bind to different pockets on the same motif-binding protein, constituting an alternative but related class).
Recall Alternative The proportion of articles correctly classified to an alternative class.


Keyword-based benchmark results

A brief description of the analysis is followed by a table of the top-five keyword terms for each motif class.

Keyword-based data benchmark table

A list of the top five keywords is presented for each class, with keywords ordered by decreasing weight. Keywords in bold and an exclusive star system denotes terms that were correctly related to the motif class. Stars can be hovered to see the manually curated relationships.

Benchmark - Keywords - tops

Figure. Benchmark - Keywords - tops

Clicking on the class name allows to browse articles classified to that class by the articles.ELM resources.

Clicking on a keyword allow to search the database for motif-related articles matching that keyword.

Manual curation keyword relationships

For each keyword related to a class, a color-based star system indicates the type of relationship that links the keyword to the class.

symbol relationship
Icon - yellow star correct binding-domain keyword
Icon - purple star correct targeted localisation keyword
Icon - pink star correct modification keyword
Icon - green star correct motif keyword
Icon - blue star correct protein/complex keyword


ELM reannotation benchmark results

A brief description of the analysis is followed by a chart and tables describing the benchmarking results by the source of protein metadata (described briefly at the bottom of the page).

Reannotation chart

A bar chart indicates values from a selection of performance metrics: Recall and Precision (see Manual curation data benchmark table for definitions).

Benchmark - ELM reannotation

Figure. Benchmark - ELM reannotation

Precision and Recall values were calculated for each of four selected data sources, plus their combination: UniProt, HIPPIE, PDB and SciLite (see Data sources: databases for descriptions, etc.)

Reannotation data benchmark table

For each data source, a table presents the number of correctly mapped proteins, the corresponding Recall value (obtained by comparison of the former with the total number of proteins in the dataset), the number of proteins mapped and the resulting Precision (the fraction of mapped proteins that were correctly mapped).

Reannotation data source overlap

A table presents the overlap of the correctly identified UniProt accessions between each article annotation resource. Both the proportion (in bold) and total number of entries (in parenthesis) are shown. The denominator of the proportion relates to the row.

Benchmark - ELM reannotation - data overlap

Figure. Benchmark - ELM reannotation - data overlap

Data sources


Databases

name description PMID URL
UniProt Protein accessions, names, sequences, families, UniRef clusters and feature annotations. 25348405 http://www.uniprot.org
ELM Manually curated linear motifs. 26615199 http://elm.eu.org
SciLite Text-mined annotations mapping biological data with research articles 28948232 http://europepmc.org/Annotations
MeSH Manually annotated terms describing article content 13982385 https://www.ncbi.nlm.nih.gov/mesh/
PDB Experimentally resolved protein tertiary structures. 10592235 http://www.rcsb.org/pdb/home/home.do
HIPPIE Validated human protein-protein interactions. 27794551 http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/


Programs

name description Publication URL
Natural Language Toolkit Natural language processing. Used to tokenize and tag the article text. http://www.datascienceassn.org/sites/default/files/Natural%20Language%20Processing%20with%20Python.pdf https://www.nltk.org/
scikit-learn Supervised classification using several machine learning methods. http://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf https://scikit-learn.org