Representation Learning of Biomedical Ontologies using Poincaré Embedding and Application to Genetic Risk Model

Author(s)
김재식
Advisor
손경아
Department
일반대학원 컴퓨터공학과
Publisher
The Graduate School, Ajou University
Publication Year
2021-08
Language
eng
Keyword
Poincaré ballPolygenic risk scoreRepresentation learningTransformer
Alternative Abstract
Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincaré ball are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature. In this thesis, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincaré embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. As one of effective downstream application of gene embeddings, we propose TransformerPRS, a deep learing model using a transformer module derived from language model, and compared with conventional polygenic risk score (PRS) which is a widely used risk scoring approach that derives a genetic risk for each individual from the sum of risk variants weighted by effect sizes from genome-wide association studies (GWASs). In the experiments, TransformerPRS with initialized by HiG2Vec showed better prediction performance than TransfermerPRS from scratch as well as conventional PRS. In addition, the self-attention module in a transformer block identified important features and their interactions. Our models can improve genetic risk prediction by providing information on which genes and interactions between genes have an important impact on prediction, which were not captured by conventional PRS.
URI
https://dspace.ajou.ac.kr/handle/2018.oak/20424
Fulltext

Appears in Collections:
Graduate School of Ajou University > Department of Computer Engineering > 3. Theses(Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse