Representation-learning based propensity score model for causal inference in high dimension

Alternative Title
고차원 데이터를 이용한 인과추론을 위한 표현 학습 기반 성향 점수 예측 모델 개발
Author(s)
유승찬
Advisor
박래웅
Department
일반대학원 의학과
Publisher
The Graduate School, Ajou University
Publication Year
2021-02
Language
eng
Keyword
deep learningobservational studypropensity scoresample size
Alternative Abstract
There has been a surge in medical research attempting causal inference along with the enhancement in the adoption of electronic health records (EHRs) and the secondary use of large claim databases. Unlike in randomized clinical trials, the assignment of treatment is not independent of the baseline characteristics in observational data. Hence, two key assumptions should be satisfied for estimating causal inference in the observational study: unconfoundedness and overlap. Unconfoundedness rather than overlap is a significant challenge in most studies. Intuitively, unconfoundedness is more plausible when more covariates are included in the analysis. In this regard, the large-scale propensity score model (LSPS) balancing virtually all observed confounders is favorable over the propensity score model adjusting expert-derived tens of variables. However, LSPS often fails to balance available covariates in the high-dimensional, low sample-size (HDLSS) data, i.e. p >> n. This weakness hinders its wide adoption through a distributed research network based on standardized clinical data. Hence, this study aims to develop a more robust framework for causal inference based on propensity score in HDLSS: database-wide representation-learning-based propensity score model (RLPS). RLPS is composed of two components: 1. a task-agnostic, database-wide asymmetrically stacked autoencoder (DASA) to abstract high-dimensional features; and 2. downstream Bayesian lasso to estimate propensity score. A task-agnostic, database-wide asymmetrically stacked autoencoder (DASA) is trained in an unsupervised way based on a database-wide feature matrix to distill condensed meaningful representation. Once DASA is pretrained, the deep encoder of DASA maps the covariates into condensed space, and then Bayesian lasso estimates propensity score as a downstream task. Finally, propensity score matching is conducted to estimate the average treatment effect. The performance of RLPS was evaluated by using two clinical cases: 1. comparative cohort study of new users of 1. angiotensin receptor blocker and calcium channel blocker in hypertension; 2. ranitidine and other H2-receptor antagonists. In each case, 1000 and 500 patients were randomly sampled 100 times from the single standardized EHR database of tertiary hospital. Unconfoundedness, accuracy in risk estimates, and residual bias were compared between RLPS and LSPS. Compared to LSPS, RLPS identified more overlap and achieved better balancing performance of a large set of covariates between target and comparator cohorts. Mostly, RLPS performs better when there is an empirical equipoise. RLPS can be an attractive alternative to LSPS in studies when the number of covariates exceeds observations. Furthermore, RLPS may facilitate the population-level estimation study using EHRs of single institutions across the distributed research network.
URI
https://dspace.ajou.ac.kr/handle/2018.oak/20293
Fulltext

Appears in Collections:
Graduate School of Ajou University > Department of Medicine > 4. Theses(Ph.D)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse