Modern AI systems focus heavily on words and text. But in speech, singing, and music, the pitch of a sound is a vital quality that carries meaningful information about emotion and beauty. Merely decoding the words leaves us without a meaningful understanding of what we are hearing.
References
Lloyd Watts (2018), "DeepPitch: Wide-Range Monophonic Pitch Estimation using Deep Convolutional Neural Networks".