The real-world data are present in the form of multiple modalities, which is called multi-modal data, such as multi-view social media data or multi-omics data. As a single-modal data may have insufficient and noisy information for learning the structure of the data despite of the massive number of data, multiple representations contribute to a better understanding of the data under the complex system that results in improved prediction performance. Especially, multi-omics studies have revealed the distinctive and shared molecular features of cancers to better understand the underlying complex biological mechanism and discover novel biomarkers associated with cancer progression and prognosis. In this respect, aggregating heterogeneous information on multi-modal data has attracted much attention in various fields of machine learning-based studies. Multi-modal data aggregation gathers shared information between different modalities or transforms the multi-modal data into a high-level feature matrix as a new input. Those techniques are useful to provide better insights into heterogeneous data in an integrated view and the transformed data can be used as an input to a prediction model which contributes to an improved predictive power and a better interpretation on the multi-modal data. However, it is challenging due to the data heterogeneity, noise, missing value, and data inconsistency. Multi-modal data are more informative to represent them as a network, as their inter- and intra-relationships between them can be incorporated.
In this thesis, we have developed two network-based multi-modal data aggregation methods: multi-view network clustering and multi-layered network-based pathway activity inference method. Then, we demonstrate each method in various experimental studies. Specifically, we applied the former approach to a social-tagged landmark image clustering method and the latter to transform multiple genomic data into a pathway-level data for clinical outcome prediction models in various cancer studies. The experimental results showed that the presented approaches effectively aggregate heterogeneous information that is robust to noise on the data, exploiting the network structure considering interactions across different modalities. Also, they facilitate the integrated network analysis as they represent multi-modal data on the integrated network before aggregating information. As they are generally applicable to any numbers and types of data in various domains, many future studies to an integrated multi-modal data analysis are possible.