Event Title

Does Phase Matter for Monaural Source Separation?

Presenter Information

Mohit Dubey, Oberlin College

Location

Science Center A254

Start Date

10-27-2017 3:00 PM

End Date

10-27-2017 4:20 PM

Research Program

New Mexico Consortium Machine Learning Summer Program

Abstract

The "cocktail party" problem of fully separating multiple sources from a single channel audio waveform remains unsolved. Current understanding of neural encoding suggests that phase information is preserved and utilized at every stage of the auditory pathway. However, current computational approaches primarily discard phase information in order to mask for amplitude spectrograms of sound. In this paper, we seek to address whether preserving phase information in spectral representations of sound provides better results in monaural separation of vocals from a musical track by using a neurally plausible sparse generative model. Our results show that preserving phase information introduces less artifacts in the separated tracks, as quantified by the signal to artifact ratio (GSAR). Furthermore, our proposed method achieves state-of-the-art performance for source separation, as quantified by a mean signal to interference ratio (GSIR) of 19.46.

Notes

Session I, Panel 4 - Sound | Science
Moderator: Joseph Lubben, Associate Professor of Music Theory

Major

Physics; Classical Guitar

Project Mentor(s)

Garrett Kenyon, Neuromorphic Computing, New Mexico Consortium

Document Type

Presentation

This document is currently not available here.

Share

COinS
 
Oct 27th, 3:00 PM Oct 27th, 4:20 PM

Does Phase Matter for Monaural Source Separation?

Science Center A254

The "cocktail party" problem of fully separating multiple sources from a single channel audio waveform remains unsolved. Current understanding of neural encoding suggests that phase information is preserved and utilized at every stage of the auditory pathway. However, current computational approaches primarily discard phase information in order to mask for amplitude spectrograms of sound. In this paper, we seek to address whether preserving phase information in spectral representations of sound provides better results in monaural separation of vocals from a musical track by using a neurally plausible sparse generative model. Our results show that preserving phase information introduces less artifacts in the separated tracks, as quantified by the signal to artifact ratio (GSAR). Furthermore, our proposed method achieves state-of-the-art performance for source separation, as quantified by a mean signal to interference ratio (GSIR) of 19.46.