Recently, with the explosive popularity of digital music, there has been tremendous interest in using technologies such as similarity measuring and filtering for the task of retrieving and recommending music. In the music information retrieval community, many researchers have been investigating and developing efficient transcription and retrieval methods for query by humming systems which has been considered as one of the most intuitive and effective query methods for music retrieval. For the voice humming to be a reliable query source, elaborate signal processing and acoustic similarity measurement schemes are necessary.
In this dissertation, we developed a novel music retrieval system called MUSEMBLE (MUSic enEMBLE) based on several distinct features: (i) A sung or hummed query is automatically transcribed into a sequence of pitch and duration pairs with improved accuracy for music representation. More specifically, we developed two new and unique techniques called WAE (Windowed Average Energy) for more accurate offset detection and EFX (Energetic Feature eXtractor) for onset, peak, attack and transient detection in acoustic signal, respectively. The former improved energy-based approaches such as AE (Average Energy) by defining multiple windows with its own local threshold value instead of one global value. On the other hand, the latter improved the AF (Amplitude Function) that calculates the summation of the absolute values of signal differences for the clustering energy contour. For accurate note onset detection, we define a dynamic threshold curve that is similar to the decay curve in the previous onset detection model [56, 57]; (ii) for accurate acquisition of the fundamental frequency of each frame, we apply the CAMDF (Circular Average Magnitude Difference Function; (iii) For the indexing purpose, we proposed a popularity-adaptive indexing structure called FAI (Frequently Accessed Index) based on frequently queried tunes. This scheme is based on the observation that users have a tendency to memorize and query a small number of melody segments, and indexing such segments enables fast retrieval; (iv) A user query is reformulated using user relevance feedback with a genetic algorithm to improve retrieval performance. Even though we have especially focused on humming queries in this dissertation, MUSEMBLE provides versatile query and browsing interfaces for various kinds of users.
To evaluate the performance of our proposed scheme, we have carried out extensive experiments on the prototype system to evaluate the performance of our voice query transcription and GA (Genetic Algorithm)-based RF (Relevance Feedback) schemes. For an extensive and accurate evaluation, we used the QBSH (Query by Singing/Humming) corpus, which was adopted in a MIREX 2006 contest data set. Experimental results show that our proposed schemes reduce note segmentation errors such as note drop, note add, pitch, and duration error, thus improving the transcription accuracy. We demonstrate that our proposed RF method improves the retrieval accuracy up to 20~40% compared with other popular RF methods. We also show that both WAE and EFX methods improve the transcription accuracy up to 95%.