ACCESSING MEDIA DATA USING METADATA REPOSITORY

US 20130166303A1
Filed: 11/13/2009
Published: 06/27/2013
Est. Priority Date: 11/13/2009
Status: Abandoned Application

First Claim

Patent Images

1. A computer-implemented method comprising:

tagging, in dialog and action text from an input script document regarding video content, at least some grammatical units of each sentence according to part-of-speech to generate tagged verb and noun phrases;

submitting the tagged verb and noun phrases to a named entity recognition (NER) extractor;

identifying and classifying, by the NER extractor, entities and actions in the tagged verb and noun phrases, the NER extractor using one or more external world knowledge ontologies in performing the identification and classification;

generating an entity-relationship data model that represents the entities and actions identified and classified by the NER extractor;

processing the generated entity-relationship data model to generate a metadata repository;

receiving, in a computer system, a user query comprising at least a first term;

parsing the user query to at least determine whether the user query assigns an action field defining the first term, the action field being a description of an action performed by an entity in a video;

converting the user query into a parsed query that conforms to a predefined format;

performing a search in the metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and being generated based on multiple modes of metadata for the video content, the search identifying a set of candidate scenes from the video content;

ranking the set of candidate scenes according to a scoring metric into a ranked scene list; and

generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method includes receiving, in a computer system, a user query comprising at least a first term, parsing the user query to at least determine whether the user query assigns a field to the first term, the parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, the search identifying a set of candidate scenes from the video content, ranking the set of candidate scenes according to a scoring metric into a ranked scene list, and generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.

401 Citations

20 Claims

1. A computer-implemented method comprising:
- tagging, in dialog and action text from an input script document regarding video content, at least some grammatical units of each sentence according to part-of-speech to generate tagged verb and noun phrases;
  
  submitting the tagged verb and noun phrases to a named entity recognition (NER) extractor;
  
  identifying and classifying, by the NER extractor, entities and actions in the tagged verb and noun phrases, the NER extractor using one or more external world knowledge ontologies in performing the identification and classification;
  
  generating an entity-relationship data model that represents the entities and actions identified and classified by the NER extractor;
  
  processing the generated entity-relationship data model to generate a metadata repository;
  
  receiving, in a computer system, a user query comprising at least a first term;
  
  parsing the user query to at least determine whether the user query assigns an action field defining the first term, the action field being a description of an action performed by an entity in a video;
  
  converting the user query into a parsed query that conforms to a predefined format;
  
  performing a search in the metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and being generated based on multiple modes of metadata for the video content, the search identifying a set of candidate scenes from the video content;
  
  ranking the set of candidate scenes according to a scoring metric into a ranked scene list; and
  
  generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, wherein the parsing further comprises determining whether the user query assigns at least any of the following fields to the first term:
    - a character field defining the first term to be a name of a video character;
      
      a dialog field defining the first term to be a word included in video dialog;
      
      oran entity field defining the first term to be an object stated or implied by a video.
  - 3. The method of claim 1, wherein the parsing comprises:
    - tokenizing the user query;
      
      expanding the first term so that the user query includes at least a second term related to the first term; and
      
      disambiguating any of the first and second terms that has multiple meanings.
  - 4. The method of claim 3, wherein expanding the first term comprises:
    - performing an online search using the first term and identifying the second term using the online search;
      
      obtaining the second term from an electronic dictionary of related words;
      
      orobtaining the second term by accessing a hyperlinked knowledge base using the first term.
  - 5. The method of claim 4, wherein performing the online search comprises:
    - entering the first term in an online search engine;
      
      receiving a search result from the online search engine for the first term;
      
      computing statistics of word occurrences in the search results; and
      
      selecting the second term from the search result based on the statistics.
  - 6. The method of claim 4, wherein disambiguating any of the first and second terms comprises:
    - obtaining information from the online search that defines the multiple meanings;
      
      selecting one meaning of the multiple meanings using the information; and
      
      selecting the second term based on the selected meaning.
  - 7. The method of claim 6, wherein selecting the one meaning comprises:
    - generating a context vector that indicates a context for the user query;
      
      entering the context vector in the online search engine and obtaining context results;
      
      expanding terms in the information for each of the multiple meanings, forming expanded meaning sets;
      
      entering each of the expanded meaning sets in the online search engine and obtaining corresponding expanded meaning results; and
      
      identifying one expended meaning result from the expanded meaning results that has a highest similarity with the context results.
  - 8. The method of claim 1, wherein performing the search in the metadata repository comprises:
    - accessing the metadata repository and identifying a matching set of scenes that match the parsed query; and
      
      filtering out at least some scenes of the matching set, a remainder of the matching set forming the set of candidate scenes.
  - 9. The method of claim 8, wherein the metadata repository includes triples formed by associating selected subjects, predicates and objects with each other, and wherein the method further comprises:
    - optimizing a predicate order in the parsed query before performing the search in the metadata repository.
  - 10. The method of claim 8, further comprising:
    - determining a selectivity of multiple fields with regard to searching the metadata repository; and
      
      performing the search in the metadata repository based on the selectivity.
  - 11. The method of claim 8, wherein the parsed query includes multiple terms assigned to respective fields, and wherein the search in the metadata repository is performed such that the set of candidate scenes match all of the fields in the parsed query.
  - 12. The method of claim 1, the method further comprising, before performing the search:
    - receiving, in the computer system, a script used in production of the video content, the script including at least dialog for the video content and descriptions of actions performed in the video content;
      
      performing, in the computer system, a speech-to-text processing of audio content from the video content, the speech-to-text processing resulting in a transcript; and
      
      creating at least part of the metadata repository using the script and the transcript.
  - 13. The method of claim 12, further comprising:
    - aligning, using the computer system, portions of the script with matching portions of the transcript, forming a script-transcript alignment, the script-transcript alignment being used in creating at least one entry for the metadata repository.
  - 14. The method of claim 1, the method further comprising, before performing the search:
    - performing an object recognition process on the video content, the object recognition process identifying at least one object in the video content; and
      
      creating at least one entry in the metadata repository that associates the object with at least one frame in the video content.
  - 15. The method of claim 1, the method further comprising, before performing the search:
    - performing an audio recognition process on an audio portion of the video content, the audio recognition process identifying at least one sound in the video content as being generated by a sound source; and
      
      creating at least one entry in the metadata repository that associates the sound source with at least one frame in the video content.
  - 16. The method of claim 1, the method further comprising, before performing the search:
    - identifying at least one term as being associated with the video content;
      
      expanding the identified term into an expanded term set; and
      
      creating at least one entry in the metadata repository that associates the expanded term set with at least one frame in the video content.

17. A computer program product tangibly embodied in a computer-readable storage medium and comprising instructions executable by a processor to perform a method comprising:
- tagging, in dialog and action text from an input script document regarding video content, at least some grammatical units of each sentence according to part-of-speech to generate tagged verb and noun phrases;
  
  identifying and classifying, by the named entity recognition (NER) extractor, entities and actions in the tagged verb and noun phrases, the NER extractor using one or more external world knowledge ontologies in performing the identification and classification;
  
  generating an entity-relationship data model that represents the entities and actions identified and classified by the NER extractor;
  
  processing the generated entity-relationship data model to generate a metadata repository;
  
  receiving, in a computer system, a user query comprising at least a first term;
  
  parsing the user query to at least determine whether the user query assigns an action field defining the first term, the action field being a description of an action performed by an entity in a video;
  
  converting the user query into a parsed query that conforms to a predefined format;
  
  performing a search in the metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and being generated based on multiple modes of metadata for the video content, the search identifying a set of candidate scenes from the video content;
  
  ranking the set of candidate scenes according to a scoring metric into a ranked scene list; and
  
  generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.

18. A computer system comprising:
- a metadata repository embodied in a computer readable medium and being generated based on multiple modes of metadata for video content, including;
  
  tagging, in dialog and action text from an input script document regarding video content, at least some grammatical units of each sentence according to part-of-speech to generate tagged verb and noun phrases;
  
  submitting the tagged verb and noun phrases to a named entity recognition (NER) extractor;
  
  identifying and classifying, by the NER extractor, entities and actions in the tagged verb and noun phrases, the NER extractor using one or more external world knowledge ontologies in performing the identification and classification;
  
  generating an entity-relationship data model that represents the entities and actions identified and classified by the NER extractor; and
  
  processing the generated entity-relationship data model to generate a metadata repository;
  
  a multimodal query engine embodied in a computer readable medium and configured for searching the metadata repository based on a user query, the multimodal query engine comprising;
  
  a parser configured to parse the user query to at least determine whether the user query assigns an action field defining the first term, the action field being a description of an action performed by an entity in a video;
  
  converting the user query into a parsed query that conforms to a predefined format;
  
  a scene searcher configured to perform a search in the metadata repository using the parsed query, the search identifying a set of candidate scenes from the video content; and
  
  a scene scorer configured to rank the set of candidate scenes according to a scoring metric into a ranked scene list; and
  
  a user interface embodied in a computer readable medium and configured to receive the user query from a user and generate an output that includes at least part of the ranked scene list in response to the user query.
- View Dependent Claims (19, 20)
- - 19. The computer system of claim 18, wherein the parser further comprises:
    - an expander expanding the first term so that the user query includes at least also a second term related to the first term.
  - 20. The computer system of claim 19, wherein the parser further comprises:
    - a disambiguator disambiguating any of the first and second terms that has multiple meanings.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Systems Incorporated (Adobe Inc.)
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Chang, Walter, Welch, Michael J.

Application Number

US12/618,353
Publication Number

US 20130166303A1
Time in Patent Office

Days
Field of Search
US Class Current

704/258
CPC Class Codes

G06F 16/7834   using audio features

G10L 15/26   Speech to text systems G10L...

G10L 25/54   for retrieval

ACCESSING MEDIA DATA USING METADATA REPOSITORY

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

401 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

ACCESSING MEDIA DATA USING METADATA REPOSITORY

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

401 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links