AJOU Central Library Repository: Representation-learning based propensity score model for causal inference in high dimension

BROWSE

Graduate School of Ajou University Department of Medicine 4. Theses(Ph.D)

Representation-learning based propensity score model for causal inference in high dimension

Alternative Title: 고차원 데이터를 이용한 인과추론을 위한 표현 학습 기반 성향 점수 예측 모델 개발

Author(s): 유승찬

Advisor: 박래웅

Department: 일반대학원 의학과

Publisher: The Graduate School, Ajou University

Publication Year: 2021-02

Language: eng

Keyword: deep learning; observational study; propensity score; sample size

Alternative Abstract: There has been a surge in medical research attempting causal inference along with the enhancement in the adoption of electronic health records (EHRs) and the secondary use of large claim databases. Unlike in randomized clinical trials, the assignment of treatment is not independent of the baseline characteristics in observational data. Hence, two key assumptions should be satisfied for estimating causal inference in the observational study: unconfoundedness and overlap. Unconfoundedness rather than overlap is a significant challenge in most studies. Intuitively, unconfoundedness is more plausible when more covariates are included in the analysis. In this regard, the large-scale propensity score model (LSPS) balancing virtually all observed confounders is favorable over the propensity score model adjusting expert-derived tens of variables. However, LSPS often fails to balance available covariates in the high-dimensional, low sample-size (HDLSS) data, i.e. p >> n. This weakness hinders its wide adoption through a distributed research network based on standardized clinical data. Hence, this study aims to develop a more robust framework for causal inference based on propensity score in HDLSS: database-wide representation-learning-based propensity score model (RLPS). RLPS is composed of two components: 1. a task-agnostic, database-wide asymmetrically stacked autoencoder (DASA) to abstract high-dimensional features; and 2. downstream Bayesian lasso to estimate propensity score. A task-agnostic, database-wide asymmetrically stacked autoencoder (DASA) is trained in an unsupervised way based on a database-wide feature matrix to distill condensed meaningful representation. Once DASA is pretrained, the deep encoder of DASA maps the covariates into condensed space, and then Bayesian lasso estimates propensity score as a downstream task. Finally, propensity score matching is conducted to estimate the average treatment effect. The performance of RLPS was evaluated by using two clinical cases: 1. comparative cohort study of new users of 1. angiotensin receptor blocker and calcium channel blocker in hypertension; 2. ranitidine and other H2-receptor antagonists. In each case, 1000 and 500 patients were randomly sampled 100 times from the single standardized EHR database of tertiary hospital. Unconfoundedness, accuracy in risk estimates, and residual bias were compared between RLPS and LSPS. Compared to LSPS, RLPS identified more overlap and achieved better balancing performance of a large set of covariates between target and comparator cohorts. Mostly, RLPS performs better when there is an empirical equipoise. RLPS can be an attractive alternative to LSPS in studies when the number of covariates exceeds observations. Furthermore, RLPS may facilitate the population-level estimation study using EHRs of single institutions across the distributed research network.

URI: https://dspace.ajou.ac.kr/handle/2018.oak/20293

Fulltext

Appears in Collections:: Graduate School of Ajou University > Department of Medicine > 4. Theses(Ph.D)

Files in This Item:: There are no files associated with this item.

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

qrcode

트윗하기

License

STATISTICS: Total Visit :4,959,841; Total Download :2,093; Today View :971

AJOU Central Library Repository는 국립중앙도서관 OAK 보급사업으로 구축되었습니다.

BROWSE

Browse