Just another Network site

Deep Learning cleans podcast episodes from ‘ahem’ sounds

#DeepLearning cleans #podcast episodes from ‘ahem’ sounds @Worldofpiggy

  • The network has been trained to detect such signals on the episodes of Data Science at Home, the podcast about data science at worldofpiggy.com/podcast
  • KDnuggets Home > News > 2016 > Nov > Tutorials, Overviews > Deep Learning cleans podcast episodes from ‘ahem’ sounds ( 16:n40 )
  • The ahem detector is a deep convolutional neural network trained on transformed audio signals to recognize ahem sounds.
  • The network will adapt to the training data and can perform detection on a different spoken voice.
  • While the detector works for the aforementioned audio files, it can be generalized to any other audio input, provided enough data are available.


“3.5 mm audio jack… Ahem!!” where did you hear that? 😉 Well, this post is not about Google Pixel vs iPhone 7, but how to remove ugly “Ahem” sound from a speech using deep convolutional neural network. I must say, very interesting read.


@kdnuggets: #DeepLearning cleans #podcast episodes from ‘ahem’ sounds @Worldofpiggy

“3.5 mm audio jack… Ahem!!” where did you hear that? 😉 Well, this post is not about Google Pixel vs iPhone 7, but how to remove ugly “Ahem” sound from a speech using deep convolutional neural network. I must say, very interesting read.

By Francesco Gadaleta, @worldofpiggy.

Do you know why you can’t hear the ugly ahem sounds on the podcast Data Science at Home? Because we remove them. Actually not us. A neural network does.

The ahem detector is a deep convolutional neural network trained on transformed audio signals to recognize ahem sounds. The network has been trained to detect such signals on the episodes of Data Science at Home, the podcast about data science at worldofpiggy.com/podcast

Slides and technical details are provided here.

But before proceeding, some concepts should be clarified.

Two sets of audio files are required, very similarly to a cohort study:

While the detector works for the aforementioned audio files, it can be generalized to any other audio input, provided enough data are available. The minimum required is ~10 seconds for the positive samples and ~3 minutes for the negative cohort. The network will adapt to the training data and can perform detection on a different spoken voice. A GPU is recommended for the training as – under the conditions specific to this example – at least 5 epochs are required to obtain ~81% accuracy.

Once the artificial brain has learned what is good and what not, a new audio file must be transformed in the same way of the training samples. This can be easily done with a utility provided together with the rest of the code.

The entire project is published on github

The full article is published at worldofpiggy.com

Enjoy!

Original post. Reposted with permission.

Bio: Francesco Gadaleta is Data Scientist at Janssen Pharmaceutical Companies of Johnson & Johnson and a Science writer. He is committed to “A World Without Disease” paradigm shift in healthcare, leveraging Artificial Intelligence and Data Science to predict risk and intercepting diseases. He is focused on putting machine learning at the service of human beings.

Related:

Deep Learning cleans podcast episodes from ‘ahem’ sounds

Comments are closed, but trackbacks and pingbacks are open.