Scarce biomedical sample exploitation approach for multimodal time series data integration
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 손경아 | - |
dc.contributor.author | 이가람 | - |
dc.date.accessioned | 2022-11-29T03:01:32Z | - |
dc.date.available | 2022-11-29T03:01:32Z | - |
dc.date.issued | 2020-02 | - |
dc.identifier.other | 29781 | - |
dc.identifier.uri | https://dspace.ajou.ac.kr/handle/2018.oak/21270 | - |
dc.description | 학위논문(박사)--아주대학교 일반대학원 :컴퓨터공학과,2020. 2 | - |
dc.description.tableofcontents | 1. Introduction 1 1.1 Overview 1 1.2 Summary of contributions 4 2. Background 6 2.1 Sequential data analysis 6 2.1.1 Recurrent neural network 6 2.1.2 Gated recurrent units 9 2.2 Meta-dimensional data integration 11 2.2.1 Concatenation-based integration 11 2.2.2 Transformation-based integration 11 2.3 Interpretability of machine learning 13 2.3.1 Intrinsic interpretable model 13 2.3.2 Model-agnostic interpretation 13 3. Exploiting samples based on prior knowledge integration 15 3.1 Introduction 15 3.2 L1-regularized linear regression 18 3.3 Kernel-reweighting lasso 19 3.4 Inferring subtype-specific network 21 3.4.1 Dataset 22 3.4.2 Predicting gene expression level based on DNA methylation 23 3.4.3 Subtype-specific prediction performance 28 3.4.4 Subtype-specific association network 30 3.5 Discussion 33 4. Multimodal time series data integration framework 37 4.1 Introduction 37 4.2 Multimodal longitudinal data integration framework 38 4.3 Experiment: Simulation study 40 4.4 Experiment: Predicting AD progression using ADNI data 43 4.4.1 Study participants 44 4.4.2 Experimental setting 45 4.4.3 Comparison of prediction of MCI to AD conversion using cross-sectional data at baseline and longitudinal data 49 4.4.4 Comparison of prediction of MCI to AD conversion using single modal and multimodal data 53 4.5 Experiment: genomic variations in Alzheimer’s disease 54 4.5.1 Methods for integrating and interpreting WGS data 54 4.5.2 Performance improvement 55 4.5.3 Model interpretation 57 4.5.4 Functional interpretation of genetic variants 59 4.6 Discussion 67 5. Conclusion 70 | - |
dc.language.iso | eng | - |
dc.publisher | The Graduate School, Ajou University | - |
dc.rights | 아주대학교 논문은 저작권에 의해 보호받습니다. | - |
dc.title | Scarce biomedical sample exploitation approach for multimodal time series data integration | - |
dc.title.alternative | Garam Lee | - |
dc.type | Thesis | - |
dc.contributor.affiliation | 아주대학교 일반대학원 | - |
dc.contributor.alternativeName | Garam Lee | - |
dc.contributor.department | 일반대학원 컴퓨터공학과 | - |
dc.date.awarded | 2020. 2 | - |
dc.description.degree | Doctoral | - |
dc.identifier.localId | 1133984 | - |
dc.identifier.uci | I804:41038-000000029781 | - |
dc.identifier.url | http://dcoll.ajou.ac.kr:9080/dcollection/common/orgView/000000029781 | - |
dc.description.alternativeAbstract | Recent technological advances enable to collect a variety of knowledge and heterogeneous data from multiple domain. As various types of data including prior knowledge and multimodality are generated, numerous methods to integrate such dataset have been developed to extract complementary knowledge from multiple domain. However, integrating prior knowledge and multimodal data is challenging in four aspects: small sample size problem (P1), sequential data processing (P2), irregularity of heterogenous data (P3), and model interpretability (P4). In this thesis, we suggest two sample exploitation methods for incorporating multimodal data resolving four aspects of knowledge and data integration issue. In the first study, we especially focus on small sample size problem (P1) for multimodal data integration in the field of bioinformatics where available sample size is extremely small. The suggested model is intrinsically able to integrate irregular multimodal data (P3) while recognizing subtype-sensitive genes (P4). Subsequently, we expand our study to time series data with multimodality (P2, P3) using sample exploitation approach (P1) while model interpretability (P4) is kept. Across two studies sample exploitations are performed via kernel-reweighting and separate learning phase, respectively. The suggested methods are validated using 4 experiments. For the first study, L1-regularized kernel-reweighting regression model is used for inferring subtype-specific patterns between gene expression and DNA methylation. Subsequent experiments include simulation study, predicting Alzheimer’s disease progression of patients in mild cognitive impairment, and analyzing genomic variation affecting AD progression. | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.