Cancer is eventually the result of cells that uncontrollably grow and do not die. Normal cells in the body follow an orderly path of growth, division, and death. When this process breaks down, cancer begins to form due to the mass abnormal cell growth. The ongoing study of gene expression with respect to multi layered genomic features is highly useful to overcome poor prognosis of cancer. Association analysis of gene expression traits with genomic features is crucial to identify the molecular mechanisms underlying cancer. Simple correlation based association tests are prone to identify more indirect genomic associations. In this study, sparse regression methods GFLasso, Lasso, SGL and SIOL were employed to discover genomic associations.
The purpose of this study is to understand all pros and cons of sparse regression, structural information and grouping effects, to identify the significant cancer causing genomic associations, genomic features and expression traits. An extensive study is carried out and compared the results obtained by each regression method. The performance is analyzed for each regression method in terms of mean squared error, non-zero beta densities, computational time, etc. Association study between gene expressions and a genomic feature (methylation) was done using the regression coefficients obtained by each computational method.
The study was carried out by analyzing the association pairs, strong influencing predicators (methylation features) and output variants (mRNA) of each method, on various cancer profiles,
? By combining the results of all regression types and fusing the results using similarity measurement i.e., similarity network fusion (SNF).
The overall motivation is to suppress noise, but still consider the weaker genomic associations that are true positives for the study, though identifying stronger genomic associations is equally important. SNF is used for this study for fusing, as fused network captures both shared and complementary information from different data sources, using propagation effects on multiple iterations.