SECRNN: Convolutional Recurrent Network for Spatial Audio Event Recognition
by Max Xiao
Category: Computer Science
Abstract – The SECRNN model for gunshot audio event detection uses CNNs and RNNs to extract spatial and temporal features respectively, resulting in superior performance compared to state-of-the-art models. It utilizes STFT without Mel Scale spectrograms to retain high-signal acoustic features and reduce spectral leakage. The proposed inverse frequency weighting approach addresses class imbalance, reducing false positives and negatives. The SECRNN model's high recall and precision make it ideal for real-life gunshot detection scenarios, providing rapid response times for short-duration and high-intensity audio events while minimizing false alarms.