Research Areas and Projects

Introduction
Speech Synthesis
Philosophy of Technology

Introduction

After ten years of work in the IT industry and close to three years of PhD study I decided to move into academia and start to do research. Chances had driven me into the field of Human Language Processing (HLP), a new but not strange territory. Speech Synthesis in particular is an interdisciplinary area mostly based on linguistics, computer science and audio signal processing. I had done no specific work in this area before, but I did have more than preliminary knowledge in linguistics, acoustics, let alone document and text processing. Since it's new some learning overhead was necessary. But with my related background I was able to get to the essential part in a short period of time. Meanwhile I have found much interest in the area.

Language is one of the defining features of human beings and definitely one pillar of human intelligence. Therefore, HLP has been an essential part of Artificial Intelligence (AI) since the beginning of its several decades of history. Compared with other areas of HLP, speech synthesis, precisely text-to-speech (TTS) synthesis, is a relatively easier task. On the one hand, text is a more regulated and easily manageable source than speech. In other words, text has less degree of freedom than speech. So speech synthesis is easier than speech recognition. On the other hand, basic text-to-speech mapping is easy to achieve. It could be done just on the word level. So speech synthesis is easier than translation, which cannot be done on the word level. That's why intelligibility had been generally achieved long time ago in TTS synthesis while other areas are still striving for accuracy. However, TTS synthesis cannot lie on this advantage. With the advantage comes higher expectation. It has to aim higher, higher than basic intelligibility. This can be achieved still in the above two respects. One direction is to handle more diverse text. But more importantly we have to make the synthesized speech more expressive. Expressiveness includes prosody (duration, pitch, strength and rhythm), personalization (personal characteristics), emotion (e.g. happiness, sadness, anger), etc. This is our current research focus.

Many techniques in speech signal processing can be transferred to music signal processing (MSP). Speech signal and music signal are both audio signals and therefore share some common characteristics. General audio features in both kinds of signals can be processed with the same algorithms. But definitely music has its unique features compared with speech. Here we need knowledge of music theory rather than linguistics. With the growing demand in the internet age MSP has become an integral part of our research portfolio. For my personal development, it gives me a chance to deepen my knowledge of music theory and try something new in another interesting area.

At home I try to extend my PhD study in Philosophy of Technology and Modernity Theory and further develop key ideas in my dissertation. The current projects are philosophical characterization of technology, Chinese philosophy of technology and Chinese modernity construction. For a long time people had been talking about science and technology at the same time. Under the once standard conception of "technology as applied science," technology had eluded close scrutiny until very recently. It turned out that it's far more complicated. It definitely has close relationship with science, especially in its modern phase. However, design makes it comparable to art and with function it has significant ethical and political implications. A more comprehensive view of technology requires a triple characterization. In my dissertation I criticized both the dystopian and utopian views of modern technology and advocated a balanced stance. On the one hand we shouldn't demonize modern technology and blame it for all the modern malaises. On the other hand we should put modern technology in its appropriate role, which entails some kinds of control. One of my theses is that technology in traditional Chinese culture once embodied this balanced stance. Technology was well developed in traditional China, but it was always kept in a subordinate role. This balanced stance was lost step by step in the modernization process. A stronger support demands a comprehensive survey of Chinese thought about technology, from Pre-Qin to the modern period. This is the main content of the second project.

Chinese modernity construction has a much larger scope, both in breadth and depth. In gereral, Chinese modernity should be a synthesis of Chinese culture and modernity. This requires deconstruction of both Chinese traditional culture and Western modernity. The task of the first part is to identify the essential characteristics of Chinese culture, whereas the task of the second part is to extract the modern elements from Western modernity, filtering out the Western elements. In this project, we have to consider major aspects of Chinese and Western culture, and dive into their foundation. From the perspective of Chinese culture, Chinese modernity construction is its second fundamental revival. The second revival is said with reference to the first. The first revival of Chinese culture happened in the Song Dynasty about a millenium ago. It was the reaction of Chinese culture to the fundamental impact of Buddhism. In parallel, the second revival is the reaction of Chinese culture to the fundamental impact of Christianity. Both revivals are said against the formation of Chinese culture during the Pre-Qin period.