Can AI Spot a Fake Voice?

A fine-tuned Whisper model facilitates both speech-to-text transcription and the identification of synthetically generated words, with special tokens [latex]\langle TOF \rangle[/latex] and [latex]\langle EOF \rangle[/latex] demarcating the boundaries of these artificial lexical units.

Researchers are leveraging speech recognition technology to identify synthetically generated words within audio recordings.