Abstract
I. INTRODUCTION
II. RELATED WORK
A. 2D Visual Representations for Audio
B. CNN-based Classification and Fusion
III. METHODOLOGY
A. Dataset and Preprocessing
B. Model Architecture
IV. EXPERIMENTS RESULTS
V. CONCLUSION
ACKNOWLEDGMENT
REFERENCES