Nature language processing (NLP) is an important research field in artificial intelligence compromising linguistics, computer science, and mathematics. Text classification is a basic task in various NLP tasks, which is the process of assigning a piece of text to one or more classes. Nowadays, the most widely used approaches for text classification are based on neural networks, namely convolutional neural networks (CNN) and recurrent neural networks (RNN). CNN is a class of deep, feed-forward neural networks which uses a variation of multiple layers of perceptron in order to minimize calculation. RNN is proposed to take advantage of sequential information, whose current output is not only influenced by the current input but also the previous input.
This thesis explores both CNN and RNN based methods for text classification. For CNN based model, gate mechanism is introduced to better capture the information of the input, residual connection and batch normalization technics are also used for gaining further improvement. For RNN based method, we explored the different impacts of four kinds of different advanced RNN units, namely unidirectional Long short-term memory (LSTM), bidirectional LSTM, unidirectional Gated recurrent unit (RGU) and bidirectional GRU combined with hierarchical attention mechanism for text classification.