Signature generation for multimedia deep-content-classification by a large-scale matching system and method thereof

US 9,384,196 B2
Filed: 02/11/2015
Issued: 07/05/2016
Est. Priority Date: 10/26/2005
Status: Active Grant

First Claim

Patent Images

1. A method for generating a large-scale database of heterogeneous speech, comprising:

transcribing a plurality of multimedia signals retrieved from a large text database and a speech database;

randomly selecting a plurality of speech segments from the plurality of multimedia signals, wherein each speech segment of the plurality of speech segments is of a random length;

generating a plurality of signatures based on the plurality of speech segments; and

populating the large-scale database with the plurality of signatures respective of the plurality of multimedia signals.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for generating a large-scale database of heterogeneous speech are provided. The method includes transcribing a plurality of multimedia signals retrieved from a large text database and a speech database; randomly selecting a plurality of speech segments from the plurality of multimedia signals, wherein each speech segment of the plurality of speech segments is of a random length; generating a plurality of signatures based on the plurality of speech segments; and populating the large-scale database with the plurality of signatures respective of the plurality of multimedia signals.

158 Citations

11 Claims

1. A method for generating a large-scale database of heterogeneous speech, comprising:
- transcribing a plurality of multimedia signals retrieved from a large text database and a speech database;
  
  randomly selecting a plurality of speech segments from the plurality of multimedia signals, wherein each speech segment of the plurality of speech segments is of a random length;
  
  generating a plurality of signatures based on the plurality of speech segments; and
  
  populating the large-scale database with the plurality of signatures respective of the plurality of multimedia signals.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the speech database further comprises speech that is pronounced according to any one of:
    - a plurality of speakers, a plurality of intonations, and a plurality of accents.
  - 3. The method of claim 1, wherein each signature of the plurality of signatures is robust to any of:
    - noise, and distortion.
  - 4. The method of claim 1, further comprising:
    - determining, for each multimedia signal of the plurality of multimedia signals, if the multimedia signal matches at least one class of multimedia signals based on the plurality of signatures and a set of representative signatures of the class of multimedia signals; and
      
      upon determining that at least one multimedia signal of the plurality of multimedia signals does not match at least one class of multimedia signals, creating a new class of multimedia signals, wherein the new class of multimedia signals comprises the plurality of signatures as new representative signatures of the new class of multimedia signals.
  - 5. The method of claim 1, wherein each multimedia signal of the plurality of multimedia signals is at least any of:
    - an audio stream, and an audio clip.
  - 6. A non-transitory computer readable medium having stored thereon instructions for conducting the method according to claim 1.

7. A system for generating a large-scale database of heterogeneous speech, comprising:
- a processor;
  
  a memory, the memory containing instructions that, when executed by the processor, configure the system to;
  
  transcribe a plurality of multimedia signals retrieved from a large text database and a speech database;
  
  randomly select a plurality of speech segments from the plurality of multimedia signals, wherein each speech segment of the plurality of speech segments is of a random length;
  
  generate a plurality of signatures based on the plurality of speech segments; and
  
  populate the large-scale database with the plurality of signatures respective of the plurality of multimedia signals.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The system of claim 7, wherein the speech database further comprises speech that is pronounced according to any one of:
    - a plurality of speakers, a plurality of intonations, and a plurality of accents.
  - 9. The system of claim 7, wherein each signature of the plurality of signatures is robust to any of:
    - noise, and distortion.
  - 10. The system of claim 7, wherein the system is further configured to:
    - determine, for each multimedia signal of the plurality of multimedia signals, if the multimedia signal matches at least one class of multimedia signals based on the plurality of signatures and a set of representative signatures of the class of multimedia signals; and
      
      upon determining that at least one multimedia signal of the plurality of multimedia signals does not match at least one class of multimedia signals, create a new class of multimedia signals, wherein the new class of multimedia signals comprises the plurality of signatures as new representative signatures of the new class of multimedia signals.
  - 11. The system of claim 7, wherein each multimedia signal of the plurality of multimedia signals is at least any of:
    - an audio stream, and an audio clip.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cortica Ltd.
Original Assignee
Cortica Ltd.
Inventors
Raichelgauz, Igal, Odinaev, Karina, Zeevi, Yehoshua Y.
Primary Examiner(s)
Holmes, Michael B

Application Number

US14/619,767
Publication Number

US 20150154189A1
Time in Patent Office

510 Days
Field of Search

706/11
US Class Current

1/1
CPC Class Codes

G06F 16/22   Indexing; Data structures t...

G06F 16/284   Relational databases

G06F 16/285   Clustering or classification

G06F 16/40   of multimedia data, e.g. sl...

G06F 16/41   Indexing; Data structures t...

G06F 16/45   Clustering; Classification

G06F 16/61   Indexing; Data structures t...

G06F 16/68   Retrieval characterised by ...

G06F 16/683   using metadata automaticall...

G06F 16/685   using automatically derived...

G06F 16/7834   using audio features

G06F 16/7844   using original textual cont...

G06F 16/7847   using low-level visual feat...

G06V 20/46   Extracting features or char...

G10L 13/06   Elementary speech units use...

G10L 15/063   Training

G10L 15/26   Speech to text systems G10L...

G10L 15/32   Multiple recognisers used i...

G10L 25/54   for retrieval

G10L 25/57   for processing of video sig...

Signature generation for multimedia deep-content-classification by a large-scale matching system and method thereof

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

158 Citations

11 Claims

Specification

Use Cases

Quick Links

Others

Signature generation for multimedia deep-content-classification by a large-scale matching system and method thereof

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

158 Citations

11 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others