Zazen Audio
Publications
Below are some of the publications I have been involved with.
Any questions about the works feel free to ask.
Evaluation of Binaural Renderers: A Methodology
ABSTRACT
Recent developments in immersive audio technology have motivated a proliferation of binaural renderers used for creating spatial audio content. Binaural renderers leverage psychoacoustic features of human hearing to reproduce a 3D sound image over headphones. In this paper, a methodology for the comparative evaluation of different binaural renderers is presented. The methodological approach is threefold: a subjective evaluation of 1) quantitative characteristics (such as front/back and up/down discrimination and localization); 2) qualitative characteristics (such as naturalness and spaciousness); and 3) overall preference. The main objective of the methodology is to help to elucidate the most meaningful factors for the performance of binaural renderers and to provide insight on possible improvements in the rendering process.
Evaluation of Binaural Renderers: Localization
ABSTRACT
Binaural renderers can be used to reproduce spatial audio over headphones. A number of different renderers have recently become commercially available for use in creating immersive audio content. High-quality spatial audio can be used to significantly enhance experiences in a number of different media applications, such as virtual, mixed and augmented reality, computer games, and music and movie. A large multi-phase experiment evaluating six commercial binaural renderers was performed. This paper presents the methodology, evaluation criteria, and main findings of the horizontal-plane source localization experiment carried out with these renderers. Significant differences between renderers’ regional localization accuracy were found. Consistent with previous research, subjects tended to localize better in the front and back of the head than at the sides. Differences between renderer performance at the side regions heavily contributed to their overall regional localization accuracy.
Evaluation of Binaural Renderers: Externalization, Front/Back and Up/Down Confusions
ABSTRACT
Binaural renderers can be used to reproduce dynamic spatial audio over headphones and deliver immersive audio content. Six commercially available binaural renderers with different rendering methodologies were evaluated in a multi-phase subjective study. This paper presents and discusses the testing methodology, evaluation criteria, and main findings of the externalization, front/back discrimination and up/down discrimination tasks which are part of the first phase. A statistical analysis over a large number of subjects revealed that the choice of renderer has a significant effect on all three dependent measures. Further, ratings of perceived externalization for the renderers were found to be content-specific, while renderer reversal rates were much more robust to different stimuli.
Evaluation of Binaural Renderers: Sound Quality
Assessment
ABSTRACT
Binaural renderers can be used to generate spatial audio content over headphones for use in a number of different media applications. These renderers take individual audio tracks with associated metadata and transform this representation into a binaural signal. A large multi-phase experiment evaluating six commercially available renderers was carried out. This paper presents the methodology, evaluation criteria, and main findings of the tests which assessed perceived sound quality of each of the renderers. In these tests, subjects appraised a number of specific sound quality attributes - naturalness, spaciousness, timbral balance, clarity, and dialogue intelligibility - as well as overall preference. Results indicated that binaural renderer performance is highly content-dependent, making it difficult to determine an “optimal” renderer for all settings.
Localization of Elevated Virtual Sources Using Four HRTF Datasets
ABSTRACT
At the core of spatial audio renderers are the HRTF filters that are used to virtually place the sounds in space. There are different ways to calculate these filters, from acoustical measurements to digital calculations using images. In this paper we evaluate the localization of elevated sources using four different HRTF datasets. The datasets used are SADIE (York University), Kemar (MIT), CIPIC (UC Davis) and finally, a personalized dataset that uses an image-capturing technique in which features are extracted from the pinnae. 20 subjects were asked to determine the location of randomly placed sounds by selecting the azimuth and the elevation from where they felt the sound was coming from. It was found that elevation accuracy is better for HRTFs that are located near elevation = 0
o. There was a tendency to under-aim and over-aim towards the area between 0o and 20o in
elevation. A high impact of elevation in azimuth location was observed in sounds placed above 60o.