Directional Audio Rendering Using a Neural Network Based Personalized HRTF
Abstract
Multi-channel speech/audio separation and enhancement methods are popularly used for many speech/audio related applications. However, these methods may cause a loss of spatial cues, including the interaural time difference and interaural level difference, for further processing of monoaural signals. Thus, listeners may encounter difficulties in understanding the direction of the source signal. We present a directional audio renderer using a personalized HRTF, which is estimated by a neural network that combines DNN and CNN with anthropometric parameters and ear images of the listener. This demonstrated directional audio renderer concept aims to help foster research on audio processing for virtual reality/augmented reality to improve the quality of service of such devices.