Multimodal speech recognition for real-time video audio-based display indicia application

US 9,959,872 B2
Filed: 12/14/2015
Issued: 05/01/2018
Est. Priority Date: 12/14/2015
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method to automatically generate audio-based display indicia of media content, the method comprising:

defining, by a processor, a plurality of media content categories for media content by at least applying a non-supervised clustering algorithm based at least in part on a set of features extracted from the media content, the set of features comprising metadata features that are extracted from one or more video descriptions of the media content, image features extracted from one or more images of the media content, drawing and reality features extracted from one or more images of the media content, and recognized characteristics of characters of the media content;

receiving, by the processor, a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories;

determining a media content category of a current media content from the plurality of media content categories based at least in part on a set of current features extracted from one or more video descriptions of the current media content, image features extracted from the current media content, drawings and reality features extracted from the current media content, and recognized characteristics of characters of the current media content;

selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content; and

applying the selected speech recognition algorithm to the current media content.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Aspects relate to computer implemented methods, systems, and processes to automatically generate audio-based display indicia of media content including receiving, by a processor, a plurality of media content categories including at least one feature, receiving a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories, determining a media content category of a current media content based on at least one feature of the current media content, selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content, and applying the selected speech recognition algorithm to the current media content.

42 Citations

View as Search Results

20 Claims

1. A computer implemented method to automatically generate audio-based display indicia of media content, the method comprising:
- defining, by a processor, a plurality of media content categories for media content by at least applying a non-supervised clustering algorithm based at least in part on a set of features extracted from the media content, the set of features comprising metadata features that are extracted from one or more video descriptions of the media content, image features extracted from one or more images of the media content, drawing and reality features extracted from one or more images of the media content, and recognized characteristics of characters of the media content;
  
  receiving, by the processor, a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories;
  
  determining a media content category of a current media content from the plurality of media content categories based at least in part on a set of current features extracted from one or more video descriptions of the current media content, image features extracted from the current media content, drawings and reality features extracted from the current media content, and recognized characteristics of characters of the current media content;
  
  selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content; and
  
  applying the selected speech recognition algorithm to the current media content.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer implemented method of claim 1, wherein the audio-based display indicia is at least one of subtitles or captions related to the media content.
  - 3. The computer implemented method of claim 1, further comprising, playing back the current media content, wherein the determination, selection, and application of the speech recognition algorithm is done during playback.
  - 4. The computer implemented method of claim 3, further comprising continuously performing the determination, selection, and application of the speech recognition algorithm during playback.
  - 5. The computer implemented method of claim 1, further comprising saving the selected speech recognition algorithm and associating the saved selected speech recognition algorithm with the current media content.
  - 6. The computer implemented method of claim 1, further comprising receiving prior audio-based display indicia associated with the current media content.
  - 7. The computer implemented method of claim 6, further comprising comparing an output of application of the speech recognition algorithm to the current media content with the prior audio-based display indicia associated with the current media content.

8. A system to automatically generate audio-based display indicia of media content comprising:
- a memory having computer readable instructions; and
  
  a processor configured to execute the computer readable instructions, the computer readable instructions comprising;
  
  defining, by the processor, a plurality of media content categories for media content by at least applying a non-supervised clustering algorithm based at least in part on a set of features extracted from the media content, the set of features comprising metadata features that are extracted from one or more video descriptions of the media content, image features extracted from one or more images of the media content, drawing and reality features extracted from one or more images of the media content, and recognized characteristics of characters of the media content;
  
  receiving, by the processor, a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories;
  
  determining, by the processor, a media content category of a current media content from the plurality of media content categories based at least in part on a set of current features extracted from the current media content, the set of current features comprising metadata features extracted from one or more video descriptions of the current media content, image features extracted from the current media content, drawing and reality features extracted from the current media content, and recognized characteristics of characters of the current media content;
  
  selecting, by the processor, one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content; and
  
  applying, by the processor the selected speech recognition algorithm to the current media content.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the audio-based display indicia is at least one of subtitles or captions related to the media content.
  - 10. The system of claim 8, further comprising, playing back the current media content, wherein the determination, selection, and application of the speech recognition algorithm is done during playback.
  - 11. The system of claim 10, further comprising continuously performing the determination, selection, and application of the speech recognition algorithm during playback.
  - 12. The system of claim 8, further comprising saving the selected speech recognition algorithm and associating the saved selected speech recognition algorithm with the current media content.
  - 13. The system of claim 8, further comprising receiving prior audio-based display indicia associated with the current media content.
  - 14. The system of claim 13, further comprising comparing an output of application of the speech recognition algorithm to the current media content with the prior audio-based display indicia associated with the current media content.

15. A computer program product to automatically generate audio-based display indicia of media content, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor to cause the processor to:
- define, by the processor, a plurality of media content categories for media content by at least applying a non-supervised clustering algorithm based at least in part on a set of features extracted from the media content, the set of features comprising metadata features that are extracted from one or more video descriptions of the media content, image features extracted from one or more images of the media content, drawing and reality features extracted from one or more images of the media content, and recognized characteristics of characters of the media content;
  
  receive, by the processor, a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories;
  
  determine, by the processor, a media content category of a current media content from the plurality of media content categories based at least in part on a set of current features extracted from the current media content, the set of current features comprising metadata features extracted from one or more video descriptions of the current media content, image features extracted from the current media content, drawing and reality features extracted from the current media content, and recognized characteristics of characters of the current media content;
  
  select, by the processor, one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content; and
  
  apply, by the processor, the selected speech recognition algorithm to the current media content.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer program product of claim 15, wherein the audio-based display indicia is at least one of subtitles or captions related to the media content.
  - 17. The computer program product of claim 15, the program instructions executable by a processor further configured to cause the processor to:
    - continuously perform the determination, selection, and application of the speech recognition algorithm during playback.
  - 18. The computer program product of claim 15, the program instructions executable by a processor further configured to cause the processor to:
    - save the selected speech recognition algorithm and associating the saved selected speech recognition algorithm with the current media content.
  - 19. The computer program product of claim 15, the program instructions executable by a processor further configured to cause the processor to:
    - receive prior audio-based display indicia associated with the current media content.
  - 20. The computer program product of claim 19, the program instructions executable by a processor further configured to cause the processor to:
    - compare an output of application of the speech recognition algorithm to the current media content with the prior audio-based display indicia associated with the current media content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Barreira Avegliano, Priscilla, Cardonha, Carlos Henrique, Mazon, Stefany, Nogima, Julio
Primary Examiner(s)
ADESANYA, OLUJIMI A

Application Number

US14/967,726
Publication Number

US 20170169827A1
Time in Patent Office

869 Days
Field of Search

704231, 704235, 704246, 704251
US Class Current
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

G10L 15/32   Multiple recognisers used i...

G10L 2021/065   Aids for the handicapped in...

G10L 21/10   Transforming into visible i...

G10L 21/18   Details of the transformati...

H04N 21/4394   involving operations for an...

H04N 21/44008   involving operations for an...

H04N 21/4884   for displaying subtitles

H04N 21/84   Generation or processing of...

H04N 21/8456   by decomposing the content ...

Multimodal speech recognition for real-time video audio-based display indicia application

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

42 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multimodal speech recognition for real-time video audio-based display indicia application

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links