PsyGeNET database information



PsyGeNET database integrates information on psychiatric disorders and their genes (Gutiérrez-Sacristán et al.,Bioinformatics 2015). This second release of PsyGeNET contains updated information on depression, bipolar disorder, alcohol use disorders and cocaine use disorders, and has been expanded to cover other psychiatric diseases of interest. The database has been developed by automatic extraction of information from the literature using the text mining tool BeFree (http://ibi.imim.es/befree/), followed by curation by experts in the domain.



1. PsyGeNET diseases


PsyGeNET contains information on eight psychiatric disorder classes, namely:

Long NameShort NameAcronym
Alcohol use disordersAlcohol UDAUD
Bipolar disorders and related disordersBipolar disorderBD
Depressive disordersDepressionDEP
Schizophrenia spectrum and other psychotic disordersSchizophreniaSCHZ
Cocaine use disordersCocaine UDCUD
Substance induced depressive disorderSI-DepressionSI-DEP
Cannabis use disordersCannabis UDCanUD
Substance induced psychosisSI-PsychosisSI-PSY
Table 1. Psychiatric disorder classes names



Each psychiatric disorder class has been defined using concepts from the UMLS Metathesaurus. The information on the gene-disease association is then referred to each specific disease UMLS concept.

2. Data sources


PsyGeNET data is classified in Psycur15 (from the first release of PsyGeNET) and Psycur16 (current 2.0 release). Note that all the information contained in the database has been curated by experts.

Psycur15: Genes associated to alcohol use disorders, bipolar disorders and related disorders, depressive disorders and cocaine use disorders. It contains 1537 associations between 579 genes and 32 psychiatric disease concepts. The information has been extracted from the literature by text mining, followed by expert curation. The curation process has been described in Gutiérrez-Sacristán et al.,Bioinformatics 2015, from data extracted from MEDLINE abstracts from 1980 and 2013.

Psycur16: Genes associated to 8 psychiatric disease classess (see PsyGeNET diseases). The information has been extracted from MEDLINE (1980 to 2015) using BeFree and curated by domain experts (more details in PsyGeNET curation process).

3. Database statistics


The current version of PsyGeNET (v2.0) contains 3,771 associations, between 1,549 genes and 117 diseases (UMLS CUIs describing: alcohol use disorders, bipolar disorders and related disorders, depressive disorders, schizophrenia spectrum and other psychotic disorders, cocaine use disorders, substance- induced depressive disorder, cannabis use disorders, substance induced-psychosis).

SourceGenesDiseasesAssociations
psycur15579321537
psycur1612671102234
ALL15491173771
Table 2. Distribution of gene, disease and unique associations provided by each source


Genes associated to each psychiatric disease class

Genes shared between pairs of psychiatric disease classes.

Barplot showing number of genes per disease class (blue) and number of genes unique to each disease class (orange).

Genes disease association according to each disease class.



(*) All the previous graphics have been generated with psygenet2r package.



4. Association Qualifier


During curation of gene-disease associations (GDAs), we found publications that supported the association between the gene and the disease while other works found just the opposite (that the gene is not associated to the disease). The latter is what is generally referred as a negative finding in the literature. In PsyGeNET we think that it is important to keep track of both “positive” and the “negative” findings, and let the user make their own judgements based on the available evidence. Thus, for each GDA and each supporting publication, we include the Association type to provide this information. According to the evidence, there are two types: “Association” and “No Association” (e.g. the “negative findings”). This information is available in the “All associations evidences” tab.

5. Evidence index


In addition to indicate the association type, we reflect the variety in the evidences for a gene-disease association in the “Evidence index” (EI). This index, like a traffic light, is green when all the evidences reviewed by the experts support the existence of an association between the gene and the disease (Association, EI = 1), is yellow when there is contradictory evidence for the GDA (some publications support the association while others publications do not support it, 1 > EI > 0), and is red when all the evidences reviewed by the experts report that there is no association between the gene and the disease (Association, EI = 0). Note that the experts validated a maximum of 5 publications for each GDA. The set of 5 publications was selected as the most recent ones.



This information is available in the “Summary of All Associations” tab in a numeric format. Note that given a set of genes of interest, psygenet2r package allows to visualize the evidence index in a heatmap where genes are located in the X axis and disorders in the Y axis, and the cell color will be red, yellow or green according to the EI value (more details in R package: psygenet2r).

6. Gene Pleiotropy

In PsyGeNET there are genes that are associated to all the disease classes (e.g. they are more pleiotropic), while other are more specific to a disease class. The Gene Pleiotropy ranges from 12.5 to 100 and is proportional to the percentage of different disease classes a gene is associated to. Thus, a gene associated to diseases of diverse classes (such as DRD2, associated to 6 disease classes - alcohol UD, bipolar disorder, depression, schizophrenia, cocaine UD and cannabis UD - ), will have a Gene Pleiotropy close to 100. Conversely, the ADRB1, is only associated to 1 disease class -depression -, and has a low Gene Pleiotropy value.

7 Disease Load


Each disorder class in PsyGeNET, namely Schizophrenia or Depression, is defined as a set of diseases identified by UMLS CUIs. Some of these diseases (UMLS CUIs) are associated to several genes in the same disease class while other diseases are associated to a reduced number of genes. The Disease Load is a measure of this property of the diseases. It is the fraction of the number of genes associated to a disease over the total number of genes associated to a disease class. For example, the Schizophrenia class is defined by 24 UMLS concepts. One of these concepts, Schizophrenia (umls:C0036341) has the larger Disease share in its class (0.95) because is annotated to 861 from the total number of 903 genes. On the other hand, Catatonic schizophrenia (umls:C0036344) has a smaller share of genes since is associated to 4 genes.

The formula of the Disease share is as follows:


8. Standards


Diseases
The diseases in the current release of PsyGeNET are standardized with the Unified Medical Language System® (UMLS®) Metathesaurus Concept Unique Identifiers (CUIs).In this way, each disease is identified by a unique CUI. Each disorder class in PsyGeNET, namely alcohol use disorders, bipolar disorders and related disorders, depressive disorders, schizophrenia spectrum and other psychotic disorders, cocaine use disorders, substance- induced depressive disorder, cannabis use disorders, substance induced-psychosis, is defined in term of a set of CUIs.

Genes
The vocabulary used for genes in the current release of PsyGeNET is the NCBI Official Full Name, NCBI Entrez gene identifiers as well as the Uniprot accession. In addition, genes are classified according to the Panther protein ontology .



9. PsyGeNET curation process



For the update of the PsyGeNET database the process that has been followed involves: i) the recruitment of a team of experts to curate the information extracted by text-mining; ii) the extraction of information of gene-disease associations (GDAs) from the literature using the text mining system BeFree (Bravo et al., 2015), iii) the development of a curation workflow iv) the development of a web-based annotation tool in order to facilitate the curation task v) the definition of detailed guidelines to assist the curation task.

We put in place a curation workflow including a pilot phase and two curation and analysis phases (see Figure 1). During the pilot phase, the initial training of the curators was carried out including how to use the curation tool. After this process both the curation tool and the annotation guidelines were improved and the first curation phase was launched (Curation Phase I), to evaluate 2,507 GDCAs identified by text mining and supported by 4,065 publications (from 1980 to 2015). The results of the curation were analyzed to estimate the inter-annotator agreement at the level of abstract. The validations for which an agreement was not found in Curation Phase I are then reviewed by a third expert during Curation Phase II (results not reported here). Four experts participate in this phase. Only the validations for which agreement of at least 2 experts was found have been included in the database. For more detailed information on the process check this publication: Gutiérrez-Sacristán et al. Text mining and expert curation to develop a database on psychiatric diseases and their genes. Proceedings of the 7th International Symposium on Semantic Mining in Biomedicine. Potsdam, Germany, August 4-5, 2016

10. PsyGeNET team


A team of 22 experts from different domains (such as psychiatry, neuroscience, medicine, psychology and biology) was recruited from the Spanish Network of Addiction and other collaborators of the coordination team (Research Group on Integrative Biomedical Informatics (GRIB)) to participate in the curation process.

Name Afiliation
Marta Portero TresserraUniversitat Pompeu Fabra
Olga ValverdeUniversitat Pompeu Fabra Grup de Recerca en Neurobiologia del Comportament (GReNeC). Departament de Ciències Experimentals i de la Salut
Antonio ArmarioUniversitat Autònoma de Barcelona
Mª Carmen Blanco GandíaUniversitat de Valencia Department of Psychobiology, Facultad de Psicología
Adriana FarréHospital del Mar, Medical Research Institute (IMIM) Institut de Neuropsiquiatria i Addiccions
Lierni Fernández-IbarrondoHospital del Mar, Medical Research Institute (IMIM)
Francina FonsecaHospital del Mar, Medical Research Institute (IMIM) Institut de Neuropsiquiatria i Addiccions; Departament de Psiquiatrai, Universitat Autònoma de Barcelona
Jesús GiraldoUniversitat Autònoma de Barcelona, Institut de Neurociències and Unitat de Bioestadística Network Biomedical Research Center on Mental Health (CIBERSAM)
Angela Leis MachínUniversitat Pompeu Fabra (UPF)- Department of Experimental and Health Sciences Research Programme on Biomedical Informatics (GRIB)- Hospital del Mar Medical Research Institute (IMIM)
Anna Mané SantacanaHospital del Mar, Medical Research Institute (IMIM) Department of Neuroscience and Psychiatry; Centro de Investigación en Red de Salud Mental (CIBERSAM)
Miguel A. MayerUniversitat Pompeu Fabra (UPF)- Department of Experimental and Health Sciences Research Programme on Biomedical Informatics (GRIB)- Hospital del Mar Medical Research Institute (IMIM)
Sandra Montagud RomeroUniversitat de Valencia Department of Psychobiology, Facultad de Psicología
Roser NadalUniversitat Autònoma de Barcelona Institut de Neurociències and Psychobiology Unit
Jordi OrtizUniversitat Autonoma de Barcelona Neurocience Institute and Departament of Biochemistry and Molecular Biology
Francisco Javier Pavon-MoronHospital Regional Universitario de Málaga-Universidad de Málaga Unidad Gestión Clínica de Salud Mental. Instituto de Investigación Biomédica de Málaga (IBIMA)
Ezequiel Jesús< Pérez SánchezParc de Salut Mar, Barcelona Institut de Neuropsiquiatria i Addiccions (INAD)
Marta Rodriguez-AriasUniversitat de Valencia Department of Psychobiology, Facultad de Psicología
Antonia Mª Serrano CriadoHospital Regional Universitario de Málaga-Universidad de Málaga Unidad Gestión Clínica de Salud Mental. Instituto de Investigación Biomédica de Málaga (IBIMA)
Marta TorrensHospital del Mar, Medical Research Institute (IMIM) Institut de Neuropsiquiatria i Addiccions; Departament de Psiquiatrai, Universitat Autònoma de Barcelona
Vincent WarnaultUniversitat Pompeu Fabra Department of Experimental and Health Sciences
Alba Gutierrez-SacristanUniversitat Pompeu Fabra (UPF)- Department of Experimental and Health Sciences Research Programme on Biomedical Informatics (GRIB)- Hospital del Mar Medical Research Institute (IMIM)
Laura I. FurlongUniversitat Pompeu Fabra (UPF)- Department of Experimental and Health Sciences Research Programme on Biomedical Informatics (GRIB)- Hospital del Mar Medical Research Institute (IMIM)
Table 3. PsyGeNET team: curators



11. PsyGeNET publications and presentations


  • PsyGeNET: a knowledge platform on psychiatric disorders and their genes Alba Gutiérrez-Sacristán, Solène Grosdidier, Olga Valverde, Marta Torrens, Àlex Bravo, Janet Pinero, Ferran Sanz, Laura I. Furlong Bioinformatics 2015; doi: 10.1093/bioinformatics/btv301

  • Alba Gutiérrez Sacristán, Álex Bravo, Marta Portero, Olga Valverde, Antonio Armario, M.C. Blanco-Gandía, Adriana Farré, Lierni Fernández-Ibarrondo, Francina Fonseca, Jesús Giraldo, Angela Leis, Anna Mané, M.A. Mayer, Sandra Montagud-Romero, Roser Nadal, Jordi Ortiz, Francisco Javier Pavon, Ezequiel Perez, Marta Rodríguez-Arias, Antonia Serrano, Marta Torrens, Vincent Warnault, Ferran Sanz, Laura I. Furlong Text mining and expert curation to develop a database on psychiatric diseases and their genes. Proceedings of the 7th International Symposium on Semantic Mining in Biomedicine. Potsdam, Germany, August 4-5, 2016

  • Alba Gutiérrez Sacristán, Álex Bravo, Marta Portero, Olga Valverde, Antonio Armario, M.C. Blanco-Gandía, Adriana Farré, Gutiérrez-Sacristán A, Àlex Bravo, Olga Valverde, Marta Torrens, Ferran Sanz and Laura I. Furlong. Leveraging text mining, expert curation and data integration to develop a database on psychiatric diseases and their genes Oral participation in the XIII Symposium on Bioinformatics (JBI2016). Valencia, Spain; 10-13 May, 2016

  • Alba Gutiérrez Sacristán, Álex Bravo, Marta Portero, Olga Valverde, Antonio Armario, M.C. Blanco-Gandía, Adriana Farré, Gutiérrez-Sacristán A, Àlex Bravo, Olga Valverde, Marta Torrens, Ferran Sanz and Laura I. Furlong. Leveraging text mining, expert curation and data integration to develop a database on psychiatric diseases and their genes Oral participation in the Ninth International Biocuration Conference. Geneva, Switzerland; 10-14 April, 2016

  • Gutiérrez-Sacristán A, Grosdidier S, Valverde O, Torrens M, Bravo À, Piñero J, Sanz F, Furlong LI. PsyGeNET: a curated resource on associations between genes and psychiatric disorders. Poster session to XII Symposium on Bioinformatics, Sevilla, Spain; 21-24 September 2014