Audio locale mismatch detection
First Claim
1. A method comprising:
- receiving, by one or more computer processors coupled to at least one memory, a media file comprising audio data and spoken language metadata, the spoken language metadata comprising an indication of an English language;
extracting, by the one or more computer processors, an audio sample from the audio data of the media file;
generating, by the one or more computer processors, a first text translation of the audio sample using a speech recognition engine based on the English language;
determining, by the one or more computer processors, that the English language does not match a spoken language of the media file based on the first text translation of the audio sample;
generating, by the one or more computer processors, a second text translation of the audio sample using the speech recognition engine based on a Spanish language;
determining, by the one or more computer processors, that the Spanish language does match the spoken language of the media file based on the second text translation; and
replacing, by the one or more computer processors, the indication of the English language in the spoken language metadata of the media file with a second indication of the Spanish language.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems, methods, and computer-readable media are disclosed for detecting a mismatch between the spoken language in an audio file and the audio language that is tagged as the spoken language in the audio file metadata. Example methods may include receiving a media file including spoken language metadata. Certain methods include generating an audio sample from the media file. Certain methods include generating a text translation of the audio sample based on the spoken language metadata. Certain methods include determining that the spoken language metadata does not match a spoken language in the audio sample based on the text translation. Certain methods include sending an indication that the spoken language metadata does not match the spoken language.
26 Citations
20 Claims
-
1. A method comprising:
-
receiving, by one or more computer processors coupled to at least one memory, a media file comprising audio data and spoken language metadata, the spoken language metadata comprising an indication of an English language; extracting, by the one or more computer processors, an audio sample from the audio data of the media file; generating, by the one or more computer processors, a first text translation of the audio sample using a speech recognition engine based on the English language; determining, by the one or more computer processors, that the English language does not match a spoken language of the media file based on the first text translation of the audio sample; generating, by the one or more computer processors, a second text translation of the audio sample using the speech recognition engine based on a Spanish language; determining, by the one or more computer processors, that the Spanish language does match the spoken language of the media file based on the second text translation; and replacing, by the one or more computer processors, the indication of the English language in the spoken language metadata of the media file with a second indication of the Spanish language. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
receiving, by one or more computer processors coupled to at least one memory, a media file comprising audio data and spoken language metadata, the spoken language metadata comprising an indication of a first language; generating, by the one or more computer processors, an audio sample from the audio data of the media file; generating, by the one or more computer processors, a text translation of the audio sample based on the first language; determining, by the one or more computer processors, that the first language does not match a spoken language of the media file based on the text translation of the audio sample; determining, by the one or more computer processors, that a second language matches the spoken language of the media file; and replacing, by one or more computer processors, the indication of the first language in the spoken language metadata of the media file with a second indication of the second language. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
-
13. A device comprising:
-
at least one memory that stores computer-executable instructions; and at least one processor configured to access the memory and execute the computer-executable instructions to; receive a media file comprising audio data and spoken language metadata, the spoken language metadata comprising an indication of a first language; generate an audio sample from the audio data of the media file; generate a text translation of the audio sample based on the first language; determine that the first language does not match a spoken language of the media file based on the text translation of the audio sample; determine that a second language matches the spoken language of the media file; and
replace the indication of the first language in the spoken language metadata of the media file with a second indication of the second language. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification