TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

Code Paper

Abstract

We present an audio super-resolution model that processes speech and music signals using neural networks. The model is optimized with both time and frequency domain loss functions. We explore different reconstruction strategies that consider a range of perceptual and adversarial losses. Our approach focuses on enhancing both low and high-frequency audio signals to produce high-quality outputs. The results show improvements in speech and music quality across various tasks.