Visual Transformers for Primates Classification and Covid Detection

Steffen Illium; Robert Müller; Andreas Sedlmeier; Claudia-Linnhoff Popien

2021 INTERSPEECH INTERSPEECH 2021

Visual Transformers for Primates Classification and Covid Detection

Abstract

We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings. When adding mel-based data augmentation techniques and sample-weighting, we achieve comparable performance on both (PRS and CCS challenge) tasks of ComParE21, outperforming most single model baselines. We further introduce overlapping vertical patching and evaluate the influence of parameter configurations.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Steffen Illium , Robert Müller , Andreas Sedlmeier , Claudia-Linnhoff Popien

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Data Augmentation Deep Learning > Architectures > Transformers Deep Learning > Techniques > Pretraining

Keywords

vision transformer attention mechanism data augmentation audio classification

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021