음성 분석 기초지식을 위한 리서치

유튜브 지식 강의

  • https://www.kaggle.com/kcs93023/keras-sequential-conv1d-model-classification
  • https://www.youtube.com/watch?v=HzgCnlre4EE
  • https://www.youtube.com/watch?v=XhjPqGKF9Zs
  • https://www.youtube.com/watch?v=mAjvfIh2iXw

음성처리 정리 블로그

https://brunch.co.kr/@kakao-it/180
https://wikidocs.net/30651 -> 정리 엄청 잘되있음
https://medium.com/@jongdae.lim/%EA%B8%B0%EA%B3%84-%ED%95%99%EC%8A%B5-machine-learning-%EC%9D%80-%EC%A6%90%EA%B2%81%EB%8B%A4-part-6-eb0ed6b0ed1d
https://engineering.linecorp.com/ko/blog/voice-waveform-arbitrary-signal-to-noise-ratio-python/

https://heartbeat.fritz.ai/a-2019-guide-to-speech-synthesis-with-deep-learning-630afcafb9dd –> 2019년도 음성분석 총 정리
https://medium.com/@saxenauts/speech-synthesis-techniques-using-deep-neural-networks-38699e943861

음성/음악신호 _ 머신러닝 초심자를 위한 가이드

  • 1편

http://keunwoochoi.blogspot.com/2016/01/blog-post.html

  • 2편

http://keunwoochoi.blogspot.com/2016/03/2.html

  • 3편

http://keunwoochoi.blogspot.com/2016/12/3.html

  • 4편

http://keunwoochoi.blogspot.com/2017/06/4.html

Augmentations

link

https://towardsdatascience.com/state-of-the-art-audio-data-augmentation-with-google-brains-specaugment-and-pytorch-d3d1a3ce291e

paper

https://ai.googleblog.com/2019/04/specaugment-new-data-augmentation.html

code

https://github.com/zcaceres/spec_augment

kaggle

Freesound Audio Tagging 2019 Solutions

  1. 1st place solution (with code)
    https://www.kaggle.com/c/freesound-audio-tagging-2019/discussion/95924#latest-586969
  2. 2nd place solution (with code)
    https://www.kaggle.com/c/freesound-audio-tagging-2019/discussion/97815#latest-582300
  3. 3nd place solution
    https://www.kaggle.com/c/freesound-audio-tagging-2019/discussion/97926#latest-583269
  4. 4nd place solution (with code)
    https://www.kaggle.com/c/freesound-audio-tagging-2019/discussion/96440#latest-561393
  5. 6nd place solution (with code)
    https://www.kaggle.com/c/freesound-audio-tagging-2019/discussion/96680#latest-623999
  6. 7nd place solution (with code)
    https://www.kaggle.com/c/freesound-audio-tagging-2019/discussion/97812#latest-564533

Beginner guide to Audio data

  • https://www.kaggle.com/maxwell110/beginner-s-guide-to-audio-data-2

Audio representation - what it’s all about

  • https://www.kaggle.com/davids1992/audio-representation-what-it-s-all-about

In-depth introduction-to-audio-for-beginners

  • https://www.kaggle.com/deepaksinghrawat/in-depth-introduction-to-audio-for-beginners

Beginner’s Visualization and Removing Uniformative Part

  • https://www.kaggle.com/dude431/beginner-s-visualization-and-removing-uniformative

Papers

  • A Study on Speech Recognition Technology

https://www.researchgate.net/publication/278811438_A_Study_on_Speech_Recognition_Technology

  • SpecAugmentation

https://ai.googleblog.com/2019/04/specaugment-new-data-augmentation.html?m=1

  • Unsupervised speech representation learning using WaveNet autoencoders

https://arxiv.org/abs/1901.08810v2

  • wav2vec: Unsupervised Pre-training for Speech Recognition

https://arxiv.org/abs/1904.05862v4

  • Learning Discriminative features using Center Loss and Reconstruction as Regularizer for Speech Emotion Recognition

https://arxiv.org/abs/1906.08873v2

  • Two-Pass End-to-End Speech Recognition

https://arxiv.org/abs/1908.10992v1

  • Advancing Speech Recognition With No Speech Or With Noisy Speech

https://arxiv.org/abs/1906.08871v5

  • Coarse-to-fine Optimization for Speech Enhancement

https://arxiv.org/abs/1908.08044v1

기타 지식

스펙토그램 설명

https://ko.wikipedia.org/wiki/스펙트로그램

스펙트럼 설명

https://ko.wikipedia.org/wiki/스펙트럼

Ok Google: How to do Speech Recognition?

https://towardsdatascience.com/ok-google-how-to-do-speech-recognition-f77b5d7cbe0b

Voice representation

  • https://www.frontiersin.org/articles/10.3389/fpsyg.2017.01180/full
  • paper
    https://www.frontiersin.org/articles/10.3389/fpsyg.2017.01180/full

음성인식 정리 잘된 프로젝트 - Project DeepSpeech

  • https://github.com/mozilla/DeepSpeech/blob/master/README.rst

ACL논문 찾아보기

  • ACL 2019 논문

https://www.aclweb.org/anthology/P19-1039.pdf