psygenet2r: An R package for querying PsyGeNET and to perform comorbidity studies in psychiatric disorders
psygenet2r has been developed to facilitate statistical analysis of PsyGeNET data, allowing its integration with
other packages available in R to develop data analysis workflows. It is available in
Bioconductor
1. Introduction
psygenet2r package allows to retrieve the genes associated to psychiatric diseases, or explore the association
between a disease of interest and PsyGeNET diseases based on shared genes. In addition, psygenet2r allows the
annotation of genes with psychiatric diseases based on expert-curated information. This functionality can be of
interest to interpret the results of GWAS or Whole Exome Sequencing studies, in which a list of gene variants is
obtained and there is a need to prioritize them based on their functional and clinical relevance. In this context,
it would be of interest to know if there is information on their implication in psychiatric diseases.
2. Implementation
The new release of the psygenet2r package is available through
Bioconductor.
To install psygenet2r the user must type the two following commands in R session:
> source("https://bioconductor.org/biocLite.R")
> biocLite("psygenet2r")
A. Visualizing gene-disease associations (GDAs) using networks
In order to visualize the GDAs between the genes found in PsyGeNET and
the distinct disorders, psygenet2r provides several options.
One of them is the GDA network (Figure 2A). In the GDA network, green
nodes represent diseases and orange nodes represent genes.
On the other hand, results can be visualized in a network according to the 8 psychiatric disorders classes available
in PsyGeNET (alcohol UD, bipolar disorder, depression, schizophrenia, cocaine UD,
SI-Depression, cannabis UD, SI-psychosis). The node’s size of each psychiatric disorder is proportional
to the number of disease concepts that belongs to each disease class, from the total number of diseases
associated to the gene (Figure 2B).
B. Characterizing protein function from a list of genes
psygenet2r package can be used to analyze gene attributes such as the function of the proteins
encoded by these genes (PANTHER Protein Class). The PANTHER Protein Class Ontology includes commonly
used classes of protein functions. The pantherGraphic function shows the Panther class to which
the proteins belong according to their associated psychiatric disorder.
Using as input a list of genes and the database (ALL, psycur15, psycur16), we will obtain
a bar-plot (Figure 3) with the different panther classes in the Y-axis and the percentage of genes in
the X-axis, grouped by PsyGeNET psychiatric disorders.
C. Number of publications supporting each gene-disease associations
We can also inspect the number of publications that support a GDA.
psygenet2r package allows the visualization of this information in
a bar-plot. Figure 4A shows an example, with the genes
in the X-axis and the number of publications in the y-axis.
D. Assessing disease similarity using the Jaccard Index
psygenet2r can also be used to know which diseases are similar to a target
disease based on shared genes. Since PsyGeNET database contains information
on genes associated to psychiatric diseases we can use it to estimate disease
similarity.
The Jaccard Index is an statistic used for comparing the similarity
of two sets. The result from the Jaccard Index estimation can be visualized in a
bar-plot (Figure 4B) where the p-value of each comparison between our genes and PsyGeNET's disease is shown.