Automatic generation of a database for speech recognition from video captions

US 9,905,221 B2
Filed: 03/09/2014
Issued: 02/27/2018
Est. Priority Date: 04/02/2013
Status: Active Grant

First Claim

Patent Images

1. A system for automatic generation of a database for speech recognition, comprising:

a text subsystem;

an audio subsystem configured to operate in synchronization with said text subsystem;

a matching module; and

a database of matching audio signals and text words;

wherein said text subsystem comprises;

a source of video frames comprising text;

a text detection module configured to receive a first video frame, detect the text therein by looking for text patterns and generate a first timestamp if the detected text in said first video frame is different than text detected in a previous video frame,said text detection module further configured to receive a second video frame,detect the text therein by looking for text patterns ad generate a second timestamp if the detected text in said second video frame is different than text detected in said first video frame; and

an Optical Character Recognition module configured to produce a string of text words representing said detected text;

wherein said audio subsystem comprises;

a source of audio signals comprising an audio representation of said detected text;

an audio buffering module configured to receive and store said audio signal between said first and second timestamps; and

an audio words separation module configured to separate said stored audio signal into a string of audio words;

said matching module configured to receive said string of text words and said string of audio words and store each pair of matching text word and audio word in said database.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for automatic generation of a database for speech recognition, comprising: a source of text signals; a source of audio signals comprising an audio representation of said text signals; a text words separation module configured to separate said text into a string of text words; an audio words separation module configured to separate said audio signal into a string of audio words; and a matching module configured to receive said string of text words and said string of audio words and store each pair of matching text word and audio word in a database.

9 Citations

View as Search Results

3 Claims

1. A system for automatic generation of a database for speech recognition, comprising:
- a text subsystem;
  
  an audio subsystem configured to operate in synchronization with said text subsystem;
  
  a matching module; and
  
  a database of matching audio signals and text words;
  
  wherein said text subsystem comprises;
  
  a source of video frames comprising text;
  
  a text detection module configured to receive a first video frame, detect the text therein by looking for text patterns and generate a first timestamp if the detected text in said first video frame is different than text detected in a previous video frame,said text detection module further configured to receive a second video frame,detect the text therein by looking for text patterns ad generate a second timestamp if the detected text in said second video frame is different than text detected in said first video frame; and
  
  an Optical Character Recognition module configured to produce a string of text words representing said detected text;
  
  wherein said audio subsystem comprises;
  
  a source of audio signals comprising an audio representation of said detected text;
  
  an audio buffering module configured to receive and store said audio signal between said first and second timestamps; and
  
  an audio words separation module configured to separate said stored audio signal into a string of audio words;
  
  said matching module configured to receive said string of text words and said string of audio words and store each pair of matching text word and audio word in said database.

2. A method of automatic generation of a database for speech recognition, comprising:
- a. producing in synchronization a string of text words and a corresponding string of audio words;
  
  b. matching pairs of text word and audio word in said respective strings; and
  
  c. storing said matched pairs in a database;
  
  wherein said producing in synchronization a string of text words and a corresponding string of audio words comprises;
  
  (i) receiving a first video frame comprising text;
  
  (ii) detecting the text in said first video frame by looking for text patterns;
  
  (iii) generating a first timestamp if the text detected in said first video frame is different than text detected in a previous video frame and storing said generated first timestamp in an audio signals buffer;
  
  (iv) producing a string of text words representing said detected text;
  
  (v) receiving a second video frame comprising text;
  
  (vi) detecting the text in said second video frame by looking for text patterns;
  
  (vii) generating a second timestamp if the text detected in said second video frame is different than text detected in said first video frame;
  
  (viii) receiving audio signals comprising an audio representation of said detected text between said first and second timestamps;
  
  (ix) storing said received audio signals and said second timestamp in said buffer; and
  
  (x) separating said stored-audio signal stored in said buffer between said first and second timestamps into a string of audio words.

3. A non-transitory computer-readable medium encoding instructions that, when executed by data processing apparatus, cause the data processing apparatus to perform operations comprising:
- a. producing in synchronization a string of text words and a corresponding string of audio words;
  
  b. matching pairs of text word and audio word in said respective strings; and
  
  c. storing said matched pairs in a database;
  
  wherein said producing in synchronization a string of text words and a corresponding string of audio words comprises;
  
  (i) receiving a first video frame comprising text;
  
  (ii) detecting the text in said first video frame by looking for text patterns;
  
  (iii) generating a first timestamp if the text detected in said first video frame is different than text detected in a previous video frame and storing said generated first timestamp in an audio signals buffer;
  
  (iv) producing a string of text words representing said detected text;
  
  (v) receiving a second video frame comprising text;
  
  (vi) detecting the text in said second video frame by looking for text patterns;
  
  (vii) generating a second timestamp if the text detected in said second video frame is different than text detected in said first video;
  
  (viii) receiving audio signals comprising an audio representation of said detected text between said first and second timestamps;
  
  (ix) storing said received audio signals and said second timestamp in said buffer; and
  
  (x) separating said audio signal stored in said buffer between said first and second timestamps into a string of audio words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Igal Nir
Original Assignee
Igal Nir
Inventors
Nir, Igal
Primary Examiner(s)
Godbold, Douglas

Application Number

US14/777,541
Publication Number

US 20160293160A1
Time in Patent Office

1,451 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/211   Schema design and management

G06F 16/685   using automatically derived...

G06F 16/7834   using audio features

G06F 16/7837   using objects detected or r...

G06F 16/7844   using original textual cont...

G10L 15/04   Segmentation; Word boundary...

G10L 15/06   Creation of reference templ...

G10L 15/08   Speech classification or se...

G10L 15/26   Speech to text systems G10L...

Automatic generation of a database for speech recognition from video captions

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

9 Citations

3 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic generation of a database for speech recognition from video captions

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

3 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links