Research Areas and Projects
After ten years of work in the IT industry and close to three years of PhD study I decided to move into academia and start to do research. Chances had driven me into the field of natural language processing (NLP), a new but not strange territory. Speech synthesis in particular is an interdisciplinary area mostly based on linguistics, computer science and audio signal processing. I had done no specific work in this area before, but I did have more than preliminary knowledge in linguistics, acoustics, let alone document and text processing. Since it's new some learning overhead was necessary. But with my related background I was able to get to the essential part in a short period of time. Meanwhile I have found much interest in the area.
Language is one of the defining features of human beings and definitely one pillar of human intelligence. Therefore, NLP has been an essential part of Artificial Intelligence (AI) since the beginning of its several decades of history. Compared with other areas of NLP, speech synthesis, precisely text-to-speech (TTS) synthesis, is a relatively easier task. On the one hand, text is a more regulated and easily manageable source than speech. In other words, text has less degree of freedom than speech. So speech synthesis is easier than speech recognition. On the other hand, basic text-to-speech mapping is easy to achieve. It could be done just on the word level. So speech synthesis is easier than translation, which cannot be done on the word level. That's why intelligibility had been generally achieved long time ago in TTS synthesis while other areas are still striving for accuracy. However, TTS synthesis cannot lie on this advantage. With the advantage comes higher expectation. It has to aim higher, higher than basic intelligibility. This can be achieved still in the above two respects. One direction is to handle more diverse text. But more importantly we have to make the synthesized speech more expressive. Expressiveness includes prosody (duration, pitch, strength and rhythm), personalization (personal characteristics), emotion (e.g. happiness, sadness, anger), etc. This is our current research focus.
Many techniques in speech signal processing can be transferred to music signal processing (MSP). Speech signal and music signal are both audio signals and therefore share some common characteristics. General audio features in both kinds of signals can be processed with the same algorithms. But definitely music has its unique features compared with speech. Here we need knowledge of music theory rather than linguistics. With the growing demand in the internet age MSP has become an integral part of our research portfolio. For my personal development, it gives me a chance to deepen my knowledge of music theory and try something new in another interesting area.
At home I try to extend my PhD study in philosophy of technology and further develop key ideas in my dissertation. Two current projects are the philosophical characterization of technology and Chinese philosophy of technology. For a long time people had been talking about science and technology at the same time. Under the once standard conception of "technology as applied science," technology had eluded close scrutiny until very recently. It turned out that it's far more complicated. It definitely has close relationship with science, especially in its modern phase. However, design makes it comparable to art and with function it has significant ethical and political implications. A more comprehensive view of technology requires a triple characterization. In my dissertation I criticized both the dystopian and utopian views of modern technology and advocated a balanced stance. On the one hand we shouldn't demonize modern technology and blame it for all the modern malaises. On the other hand we should put modern technology in its appropriate role, which entails some kinds of control. One of my theses is that technology in traditional Chinese culture once embodied this balanced stance. Technology was well developed in traditional China, but it was always kept in a subordinate role. This balanced stance was lost step by step in the modernization process. A stronger support demands a comprehensive survey of Chinese thought about technology, from pre-Qin to the modern period. This is the main content of the second project.