Captioning Using Socially Derived Acoustic Profiles
First Claim
1. A method, in a data processing system, for performing dynamic automatic speech recognition on a portion of multimedia content, comprising:
- segmenting the multimedia content into a at least one segment, wherein each segment is a homogeneous region of content with regard to speakers and background sounds in the region of content;
identifying, for the at least one segment, a speaker providing speech in an audio track of the at least one segment, using information retrieved from a social network service source;
generating a speech profile for the speaker using information retrieved from the social network service source;
generating an acoustic profile for the segment based on the generated speech profile;
dynamically configuring an automatic speech recognition engine of the data processing system for operation on the at least one segment based on the acoustic profile; and
performing automatic speech recognition operations on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker.
1 Assignment
0 Petitions
Accused Products
Abstract
Mechanisms for performing dynamic automatic speech recognition on a portion of multimedia content are provided. Multimedia content is segmented into homogeneous segments of content with regard to speakers and background sounds. For the at least one segment, a speaker providing speech in an audio track of the at least one segment is identified using information retrieved from a social network service source. A speech profile for the speaker is generated using information retrieved from the social network service source, an acoustic profile for the segment is generated based on the generated speech profile, and an automatic speech recognition engine is dynamically configured for operation on the at least one segment based on the acoustic profile. Automatic speech recognition operations are performed on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker.
-
Citations
25 Claims
-
1. A method, in a data processing system, for performing dynamic automatic speech recognition on a portion of multimedia content, comprising:
-
segmenting the multimedia content into a at least one segment, wherein each segment is a homogeneous region of content with regard to speakers and background sounds in the region of content; identifying, for the at least one segment, a speaker providing speech in an audio track of the at least one segment, using information retrieved from a social network service source; generating a speech profile for the speaker using information retrieved from the social network service source; generating an acoustic profile for the segment based on the generated speech profile; dynamically configuring an automatic speech recognition engine of the data processing system for operation on the at least one segment based on the acoustic profile; and performing automatic speech recognition operations on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:
-
segment a multimedia content into a at least one segment, wherein each segment is a homogeneous region of content with regard to speakers and background sounds in the region of content; identify, for the at least one segment, a speaker providing speech in an audio track of the at least one segment, using information retrieved from a social network service source; generate a speech profile for the speaker using information retrieved from the social network service source; generate an acoustic profile for the segment based on the generated speech profile; dynamically configure an automatic speech recognition engine of the data processing system for operation on the at least one segment based on the acoustic profile; and perform automatic speech recognition operations on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. An apparatus, comprising:
-
a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to; segment a multimedia content into a at least one segment, wherein each segment is a homogeneous region of content with regard to speakers and background sounds in the region of content; identify, for the at least one segment, a speaker providing speech in an audio track of the at least one segment, using information retrieved from a social network service source; generate a speech profile for the speaker using information retrieved from the social network service source; generate an acoustic profile for the segment based on the generated speech profile; dynamically configure an automatic speech recognition engine of the data processing system for operation on the at least one segment based on the acoustic profile; and perform automatic speech recognition operations on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker.
-
Specification