Speech recognition for internet video search and navigation

US 10,565,988 B2
Filed: 03/08/2016
Issued: 02/18/2020
Est. Priority Date: 10/31/2006
Status: Active Grant

First Claim

Patent Images

1. A television comprising:

an audio video receiver;

a display;

circuitry configured to;

receive speech signals representing a video site or video subject;

implement speech recognition on received speech signals to generate recognized speech data representing a video site or video subject;

using the recognized speech data representing the video site or video subject, access at least one database including indices derived from at least digitized voice soundtracks that accompany video, or at least descriptive text that is associated with video, or at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video, the indices being associated with the at least one database; and

at least one index in the indices being correlated with the recognized speech and identified as at least one matching index element from the at least one database, the matching index element being useful for providing video to the display.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech representing a desired video site or video subject is detected and digitized at a TV remote, and then sent to a TV. The TV or in some embodiments an Internet server communicating with the TV use speech recognition principles to recognize the speech, enter a database using the recognized speech as entering argument, and return a link to an Internet site hosting the desired video. The link can be displayed on the TV for selection thereof by a user to retrieve the video.

32 Citations

26 Claims

1. A television comprising:
- an audio video receiver;
  
  a display;
  
  circuitry configured to;
  
  receive speech signals representing a video site or video subject;
  
  implement speech recognition on received speech signals to generate recognized speech data representing a video site or video subject;
  
  using the recognized speech data representing the video site or video subject, access at least one database including indices derived from at least digitized voice soundtracks that accompany video, or at least descriptive text that is associated with video, or at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video, the indices being associated with the at least one database; and
  
  at least one index in the indices being correlated with the recognized speech and identified as at least one matching index element from the at least one database, the matching index element being useful for providing video to the display.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The television of claim 1, wherein the circuitry is configured to access at least one index derived from text input, for at least a data amount.
  - 3. The television of claim 1, wherein the circuitry is configured to access information received for a most recent time period.
  - 4. The television of claim 1, wherein the circuitry is configured to access a most recent “
    - X”
      
      amount of information received, wherein “
      
      X”
      
      is a data amount.
  - 5. The television of claim 1, wherein the circuitry is configured to access information representing items that are initial, manufacturer-defined grammar.
  - 6. The television of claim 1, wherein the television is configured to maintain a limited grammar database so that memory and processing requirements to process the limited grammar database are manageable within the confines of the processor and memory, the limited grammar database including indices derived at least from the closed captioned text received by the television for a past “
    - X”
      
      bytes, the limited grammar database not including indices derived from the closed captioned text received by the television in excess of the past “
      
      X”
      
      bytes, such that a match to the recognized speech is identified responsive to the recognized speech containing content that has occurred in the broadcast in the past “
      
      X”
      
      bytes.
  - 7. The television of claim 1, wherein the television comprises a remote control device including the circuitry.
  - 8. The television of claim 1, wherein the indices are derived from at least digitized voice soundtracks that accompany video.
  - 9. The television of claim 1, wherein the indices are derived from at least descriptive text that is associated with video.
  - 10. The television of claim 1, wherein the indices are derived from both digitized voice soundtracks that accompany video and descriptive text that is associated with video.

11. A television comprising:
- an audio video device (AVD);
  
  a remote control;
  
  wherein the remote control comprises circuitry to digitize received speech and send the digitized speech to the AVD;
  
  wherein the AVD comprises circuitry to;
  
  generate wireless commands to an audio video device (AVD);
  
  receive digitized speech and generate recognized speech from the digitized speech, the recognized speech being associated with a video;
  
  using the recognized speech as entering argument, access a data structure correlating speech associated with video to computer storage locations of stored video, the data structure comprising at least one index derived from at least digitized voice soundtracks that accompany video, or at least descriptive text that is associated with video, or at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video; and
  
  retrieving, from the data structure, at least an identification of at least one video correlated to a match of the recognized speech.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 12. The television of claim 11, wherein remote control device is configured to access at least in part metadata received in video adapted to be presented on the AVD.
  - 13. The television of claim 11, wherein remote control device is configured to access at least in part closed caption text received in video adapted to be presented on the AVD.
  - 14. The television of claim 11, wherein the remote control device is configured to access only information received for a most recent time period.
  - 15. The television of claim 11, wherein the remote control device is configured to access only a most recent “
    - X”
      
      amount of information received, wherein “
      
      X”
      
      is a data amount.
  - 16. The television of claim 11, wherein the remote control device is adapted to remotely control a television receiver.
  - 17. The television of claim 11, wherein the data structure is obtained at least in part using metadata received in video presented on an audio video device (AVD).
  - 18. The television of claim 11, wherein the data structure is obtained at least in part using closed caption text received in video presented on the AVD.
  - 19. The television of claim 11, wherein the index is derived from at least digitized voice soundtracks that accompany video.
  - 20. The television of claim 11, wherein the index is derived from at least descriptive text that is associated with video.
  - 21. The television of claim 11, wherein the index is derived from at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video.

22. A machine-executed method comprising:
- receiving speech signals representing a video site or video subject;
  
  implementing speech recognition on received speech signals representing a video site or video subject to generate recognized speech;
  
  using the recognized speech representing the video site or video subject, access at least one database including at least one index derived from at least digitized voice soundtracks that accompany video, or at least descriptive text that is associated with video, or at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video; and
  
  correlating the recognized speech with at least one element of the index identified by the accessing to identify at least one matching index element from the at least one database, the matching index element being useful for providing video to the AVD.
- View Dependent Claims (23, 24, 25)
- - 23. The method of claim 22, wherein the index is derived from at least digitized voice soundtracks that accompany video.
  - 24. The method of claim 22, wherein the index is derived from at least descriptive text that is associated with video.
  - 25. The method of claim 22, wherein the index is derived from at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video.

26. A computer-implemented method comprising:
- recognizing digitized speech representing a video and generating recognized speech in response;
  
  using the recognized speech representing a video as entering argument, access a data structure correlating speech associated with video to computer storage locations of stored video, the data structure comprising at least one index derived from at least digitized voice soundtracks that accompany video, or at least descriptive text that is associated with video, or at least both digitized voice soundtracks that accompany video and descriptive text that is assocaited with video;
  
  retrieving, from the data structure, at least an identification of at one video correlated to a match of the recognized speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Saturn Licensing LLC (Sony Group Corp.)
Original Assignee
Saturn Licensing LLC (Sony Group Corp.)
Inventors
Dacosta, Behram Mario
Primary Examiner(s)
Baker, Matthew H

Application Number

US15/064,035
Publication Number

US 20160189711A1
Time in Patent Office

1,442 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/73   Querying

G06F 16/7328   Query by example, e.g. a co...

G06F 16/7834   using audio features

G10L 15/10   using distance or distortio...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/223   Execution procedure of a sp...

H04N 21/23109   by placing content in organ...

H04N 21/42203   sound input device, e.g. mi...

H04N 21/42222   Additional components integ...

H04N 21/4332   by placing content in organ...

H04N 21/4381   Recovering the multiplex st...

H04N 21/4415   using biometric characteris...

H04N 21/4884   for displaying subtitles

H04N 21/6125   involving transmission via ...

H04N 21/64322   IP

Speech recognition for internet video search and navigation

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

32 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition for internet video search and navigation

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links