psygenet2r: An R package for querying PsyGeNET and to perform comorbidity studies in psychiatric disorders



psygenet2r has been developed to facilitate statistical analysis of PsyGeNET data, allowing its integration with other packages available in R to develop data analysis workflows. It is available in Bioconductor



1. Introduction
psygenet2r package allows to retrieve the genes associated to psychiatric diseases, or explore the association between a disease of interest and PsyGeNET diseases based on shared genes. In addition, psygenet2r allows the annotation of genes with psychiatric diseases based on expert-curated information. This functionality can be of interest to interpret the results of GWAS or Whole Exome Sequencing studies, in which a list of gene variants is obtained and there is a need to prioritize them based on their functional and clinical relevance. In this context, it would be of interest to know if there is information on their implication in psychiatric diseases.

Figure 1


2. Implementation

The new release of the psygenet2r package is available through Bioconductor. To install psygenet2r the user must type the two following commands in R session:
> source("https://bioconductor.org/biocLite.R")
> biocLite("psygenet2r")

3. Examples

A. Visualizing gene-disease associations (GDAs) using networks

In order to visualize the GDAs between the genes found in PsyGeNET and the distinct disorders, psygenet2r provides several options.

One of them is the GDA network (Figure 2A). In the GDA network, green nodes represent diseases and orange nodes represent genes.

On the other hand, results can be visualized in a network according to the 8 psychiatric disorders classes available in PsyGeNET (alcohol UD, bipolar disorder, depression, schizophrenia, cocaine UD, SI-Depression, cannabis UD, SI-psychosis). The node’s size of each psychiatric disorder is proportional to the number of disease concepts that belongs to each disease class, from the total number of diseases associated to the gene (Figure 2B).

Figure 2


B. Characterizing protein function from a list of genes

psygenet2r package can be used to analyze gene attributes such as the function of the proteins encoded by these genes (PANTHER Protein Class). The PANTHER Protein Class Ontology includes commonly used classes of protein functions. The pantherGraphic function shows the Panther class to which the proteins belong according to their associated psychiatric disorder.

Using as input a list of genes and the database (ALL, psycur15, psycur16), we will obtain a bar-plot (Figure 3) with the different panther classes in the Y-axis and the percentage of genes in the X-axis, grouped by PsyGeNET psychiatric disorders.

Figure 3


C. Number of publications supporting each gene-disease associations

We can also inspect the number of publications that support a GDA. psygenet2r package allows the visualization of this information in a bar-plot. Figure 4A shows an example, with the genes in the X-axis and the number of publications in the y-axis.

D. Assessing disease similarity using the Jaccard Index

psygenet2r can also be used to know which diseases are similar to a target disease based on shared genes. Since PsyGeNET database contains information on genes associated to psychiatric diseases we can use it to estimate disease similarity.

The Jaccard Index is an statistic used for comparing the similarity of two sets. The result from the Jaccard Index estimation can be visualized in a bar-plot (Figure 4B) where the p-value of each comparison between our genes and PsyGeNET's disease is shown.

Figure 4