METHOD AND SYSTEM FOR PROCESSING MULTIMEDIA CONTENT TO DYNAMICALLY GENERATE TEXT TRANSCRIPT

US 20180108354A1
Filed: 10/18/2016
Published: 04/19/2018
Est. Priority Date: 10/18/2016
Status: Active Grant

First Claim

Patent Images

1. A method for processing multimedia content, by a computing device, to dynamically generate a text transcript, said method comprising:

independently segmenting, by a region segmenting processor in an automatic speech recognition (ASR) unit in said computing device, each of a set of text frames that correspond to audio content and visual content of said multimedia content, to determine one or more spatial regions comprising at least one or more portions of text content, wherein one or more keywords are extracted, by a data processor in said ASR unit, from each of said determined one or more spatial regions;

generating, by a graph generating processor in said ASR unit, a graph based on at least a semantic relationship between each of a first set of keywords and one or more of a second set of keywords;

wherein said first set of keywords is determined, by a natural language processor in said ASR unit, from one or more keywords based on filtering of at least one or more off-topic keywords from said one or more keywords;

wherein said second set of keywords is extracted, by said data processor, from one or more knowledge databases based on at least said determined first set of keywords; and

generating, by a speech-to-text generating processor in said ASR unit, dynamically said text transcript of audio content in said multimedia content based on at least said generated graph.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed embodiments illustrate method and system of processing multimedia content to generate a text transcript. The method includes segmenting each of a set of text frames to determine spatial regions. The method further includes extracting one or more keywords from each of the determined spatial regions. The method further includes determining the first set of keywords from the extracted one or more keywords based on filtering of one or more off-topic keywords from the extracted one or more keywords. The method further includes extracting a second set of keywords based on the determined first set of keywords. The method further includes generating a graph between each of a first set of keywords and one or more of a second set of keywords. The method further includes dynamically generating the text transcript of audio content in the multimedia content based on the generated graph.

Citations

35 Claims

1. A method for processing multimedia content, by a computing device, to dynamically generate a text transcript, said method comprising:
- independently segmenting, by a region segmenting processor in an automatic speech recognition (ASR) unit in said computing device, each of a set of text frames that correspond to audio content and visual content of said multimedia content, to determine one or more spatial regions comprising at least one or more portions of text content, wherein one or more keywords are extracted, by a data processor in said ASR unit, from each of said determined one or more spatial regions;
  
  generating, by a graph generating processor in said ASR unit, a graph based on at least a semantic relationship between each of a first set of keywords and one or more of a second set of keywords;
  
  wherein said first set of keywords is determined, by a natural language processor in said ASR unit, from one or more keywords based on filtering of at least one or more off-topic keywords from said one or more keywords;
  
  wherein said second set of keywords is extracted, by said data processor, from one or more knowledge databases based on at least said determined first set of keywords; and
  
  generating, by a speech-to-text generating processor in said ASR unit, dynamically said text transcript of audio content in said multimedia content based on at least said generated graph.

2-23. -23. (canceled)

24. A method for dynamically generating a text transcript, comprising:
- segmenting each identified set of text frames to determine one or more spatial regions, wherein the one or more spatial regions comprise at least one or more portions of text content;
  
  extracting one or more keywords from the one or more spatial regions, wherein the one or more keywords are extracted from one or more available portions of the text content in the one or more spatial regions;
  
  determining a first set of keywords from the one or more extracted keywords by filtering one or more off-topic keywords from the one or more extracted keywords;
  
  extracting a second set of keywords similar to or related with the one or more second set of keywords, wherein the second set of keywords being retrieved from one or more knowledge databases;
  
  generating a graph from a semantic relationship between the first set of keywords and the second set of keywords, wherein the graph comprises one or more first nodes and one or more second nodes with each node in the one or more first nodes corresponding with a keyword in the first set of keywords and each node in the one or more second nodes corresponding with a keyword in the second set of keywords; and
  
  generating the text transcript of the audio content in the multimedia content using the generated graph, wherein the generating of the text transcript comprises utilizing at least one an updated language unit and an updated dictionary unit to generate the text transcript of the audio content in the multimedia content.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 25. The method of claim 24, further comprising:
    - determining one or more frames of the multimedia content; and
      
      identifying the set of text frames from the one or more frames, whereinthe set of text frames are identified from one or more available portions of the text content within the one or more frames.
  - 26. The method of claim 24, further comprising:
    - identifying the one or more off-topic keywords, whereinthe identifying the one or more off-topic keywords comprises clustering a distributional representation of the one or more keywords in the identified set of text frames, andthe one or more off-topic keywords comprising one or more repeated keywords in the set of text frames, numerals or keywords with one or more special characters, or both.
  - 27. The method of claim 24, wherein the graph further comprises one or more edges, whereinthe one or more edges corresponding to a semantic relationship between a first node in the one or more first nodes and a second node in the one or more second nodes.
  - 28. The method of claim 27, wherein each of the one or more first nodes that correspond to a keyword in the generated graph is associated with a predefined weight corresponding to a probability of occurrence of the keyword, andeach of the one or more second nodes that correspond to a keyword in the generated graph is associated with a predefined weight corresponding to a probability of occurrence of the keyword.
  - 29. The method of claim 28, further comprising:
    - updating the predefined weight of each of the one or more first nodes and the predefined weight of each of the one or more second nodes in the generated graph.
  - 30. The method of claim 29, wherein the updating of the predefined weight comprises determining a weight for each node of the one or more first nodes and the one or more second nodes using
    minΣ
    - (i,j)ϵ
      
      E|a_i−
      
      a_j|where E corresponds to an extended keyword graph, a_icorresponds to a weight by which a unigram probability estimate for keyword I is incremented, and a_jis defined for keywords in set M=L∪
      
      D, L and D correspond to nodes and edges.
  - 31. The method of claim 30, wherein the determining of the weight is constrained by one or more of the following constraints
  - 32. The method of claim 29, further comprising:
    - updating at least one of a language unit, a dictionary unit, or both, based on the updated predefined weight of each of the one or more first nodes and each of the one or more second nodes in the generated graph.
  - 33. The method of claim 32, wherein the updating of the language unit corresponds to an update of a probability of occurrence of a keyword being spoken within the video content.
  - 34. The method of claim 33, wherein the updating of the dictionary unit corresponds to an update of the one or more keywords in the dictionary unit.

35. A method for generating a text transcript from audio within multimedia content, comprising:
- receiving a request for generating the text transcript from the multimedia content;
  
  processing the multimedia content to obtain a first set of keywords and a second set of keywords;
  
  generating a graph using the first set of keywords and the second set of keywords, wherein the graph comprises one or more nodes corresponding to one or more keywords of the first set of keywords and one or more nodes corresponding to one or more keywords of the second set of keywords, each of the one or more nodes corresponding to the one or more keywords of the first set of keywords is associated with a predefined weight and each of the one or more nodes corresponding to the one or more keywords of the second set of keywords is associated with a predefined weight;
  
  updating the weight of each of the one or more nodes for the first set of keywords and the weight of each of the one or more nodes for the second set of keywords to replace one or more keywords misinterpreted within a generated text transcript.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Yen4Ken Inc. (VideoKen, Inc.)
Original Assignee
Yen4Ken Inc. (VideoKen, Inc.)
Inventors
Negi, Sumit, Patil, Sonal S., Biswas, Arijit, Gandhi, Ankit, Deshmukh, Om D.

Granted Patent

US 10,056,083 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/088   Word spotting

METHOD AND SYSTEM FOR PROCESSING MULTIMEDIA CONTENT TO DYNAMICALLY GENERATE TEXT TRANSCRIPT

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR PROCESSING MULTIMEDIA CONTENT TO DYNAMICALLY GENERATE TEXT TRANSCRIPT

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links