cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
SethFalco
Strollin' around
Status: New idea

It would be great if Firefox could offer automatic transcriptions of audio playback for when subtitles or closed captions aren't available.

This would make the web a lot more accessible. While this is not an aimed to be an alternative for websites to provide their own subtitles or closed captions, it is a good backup solution to have when browsing the web. This is particularly helpful for live content or when consuming content by small content creators that may not have the means to provide subtitles or closed captions themselves.

I'd expect this to use the dataset from commonvoice.mozilla.org and operate locally. If the system requirements are high, this would be an optional feature that individuals can enable, which downloads the components separately from the browser.

This would be easier implemented after 2177 as that already requires transcribing from an input device. This takes that same requirement, but transcribes from a playback device instead.

See: https://connect.mozilla.org/t5/ideas/add-native-search-by-voice-to-firefox/idi-p/2177

When or if this is implemented, a great next step would be to offer this feature in combination with automatic translations, so users can have the translated transcription.

2 Comments
Status changed to: New idea
Jon
Community Manager
Community Manager

Thanks for submitting an idea to the Mozilla Connect community! Your idea is now open to votes (aka kudos) and comments.

SamZenof
New member

@SethFalco can you confirm if this idea is the same as:

FF currently offers the right-click option 'Text from image'; given the prevalence of audio GIF's on the internet, it would be great if a 'Generate subtitles for video' option was available when right-clicking a video or GIF element. It would use an open-source model to do the transcription; if a model download is required there would be a pop-up the first time. Importantly, the target gif/video would gain an artificial ~1s lag from the user's pov while transcribing, since subtitles *must* appear ~.5s before the matching audio for best user experience. Other apps/platforms that do on-demand ML subtitles don't introduce video lag so the generated subtitles display 1/2-1s after the audio and its a terrible experience.