Node embedding is a method to map each node in a graph into a low dimensional vector space. With the embeddings, many downstream machine learning methods can be applied, such as classification or clustering. Most of the node embedding method only considers the graph's structure which means linkage relationship among the nodes in a graph.
Nowadays, there are many graph data containing nodes' attributes in the real world. The node attributes contain information as much as the linkage relationships do. If the nodes’ own attributes as well as the graph’s structure can be incorporated into the embeddings, it can help to improve the quality of the embeddings. Some of these existing embedding methods rely on the sampling of random walks. The samples of random work are to be fed into a language model to get the embeddings of the nodes. Most of random walk based node embedding methods rely on the random walk sampling process. The sampling strategy, such as how many random walks will be sampled or how long a random walk is, affects the quality of the embeddings.
In this thesis, we propose a node embedding method based on both the nodes' attributes and its local structure. Our model can also output the probability that an edge exists between given nodes. The directionality of the edge can be considered if necessary. We can use these probabilities as similarity among the nodes. Our experimental results show that the proposed model outperforms several previous methods for node classification, link prediction, and graph reconstruction tasks on citation datasets and social network datasets.