Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types
First Claim
1. A computer implemented method for retrieving documents, including compound documents containing both text and at least one image, each image having a predetermined position within the document, the method comprising:
- providing a multimedia index comprising a plurality of text tokens and image tokens, each text token associated with a text string, each image token associated an image feature, each token indexed to a list of documents including data associated with the token, and for each listed document indexed to an image token, reference data specifying at least a position of an image within the document from which the image token was identified; and
processing a compound query, including at least one text token, at least one image token, and at least one search operator defining a logical or proximity relationship between the text and image tokens, with respect to the index to selectively retrieve compound documents that satisfy the compound query.
12 Assignments
0 Petitions
Accused Products
Abstract
A system, method and software product provides for true multimedia document retrieval by receiving multimedia queries containing various types of data, such as text keywords, images, audio or other data types, and processing such queries against a multimedia index that commonly indexes multimedia documents, including any of their multimedia components. The unified index allows query retrieval by evaluation of a query structure which can contain any of the multimedia data types, and operators which can be evaluated on any of these data types. The system indexes multimedia documents by separating them into their multimedia components, and processing each component into a number of tokens. The tokens are stored in the index along with information identifying the documents that contain the token, and reference data describing the position of the token within the documents, and any other data extracted from the multimedia component of the document, such as color, texture, luminance, recognized speech, or the like. During retrieval, a query is decomposed into multimedia components, which are then converted to a set of tokens and query structure including mathematical and proximity operators. Query expansion is used to expand the query structure to include additional tokens corresponding to various ones of input query tokens. Because the multimedia components are all indexed in the unified index, there is no need to process different parts of the query against different indices and databases in order to select documents that best satisfy the query.
-
Citations
68 Claims
-
1. A computer implemented method for retrieving documents, including compound documents containing both text and at least one image, each image having a predetermined position within the document, the method comprising:
-
providing a multimedia index comprising a plurality of text tokens and image tokens, each text token associated with a text string, each image token associated an image feature, each token indexed to a list of documents including data associated with the token, and for each listed document indexed to an image token, reference data specifying at least a position of an image within the document from which the image token was identified; and
processing a compound query, including at least one text token, at least one image token, and at least one search operator defining a logical or proximity relationship between the text and image tokens, with respect to the index to selectively retrieve compound documents that satisfy the compound query.
-
-
2. An article of manufacture comprising:
-
a computer readable medium; and
a multimedia index stored on the computer readable medium and including;
a ordered set of tokens, including a plurality of distinct tokens for each of a plurality of different multimedia data types, including both text and image tokens, each token representing a distinct datum of one of the multimedia data types, and indexed to a set of documents containing at least one instance of the datum, each indexed document having reference data describing a position of the instance of data represented by the token within the document. - View Dependent Claims (3)
for each text token representing a distinct text string, the reference data associated with the text token contains the position of each occurrence of the text string in a document containing an instance of the text string; and
for each image token representing a distinct image feature, the reference data associated with the image token contains image data extracted from an image in a document, and the position of the image in the document.
-
-
4. An article of manufacture, comprising:
-
a computer readable medium, storing thereon a plurality of modules executable by a processor, and including;
a multimedia component separation module that receives a document, and separates the document into an ordered set of multimedia components, including at least one text component and at least one image component;
a text pre-processing module that receives the at least one text multimedia component and produces at least one text token representing an instance of text data in the received text multimedia components;
an image pre-processing module that receives the image multimedia components and produces at least one image token representing an instance of image data in the received image multimedia components;
a multimedia index that receives from the text pre-processing module and image pre-processing module text and image tokens respectively, and indexes each received token to the document containing the instances of text or image data;
a query separation module that receives a compound query and separates the query into multimedia query components, each multimedia query component having a data type;
a query pre-processing module that produces a set of query tokens from the multimedia query components;
a query structuring module that structures the set of query tokens into an evaluatable query structure; and
a query execution module that processes the query structure with respect to the multimedia index to selectively identify documents that satisfy the compound query.
-
-
5. An article of manufacture for indexing and retrieving compound documents comprising data from a plurality of different multimedia data types, comprising:
-
a computer readable medium, storing thereon a software product executable by a processor to perform The operations of;
indexing a plurality of documents, including compound documents to form a multimedia index to include a plurality of tokens, each token representing a distinct instance of data of one of the multimedia data types in one of the documents, each token indexed to a list of the documents containing an instance of the data, and for each of the documents indexed to a token;
processing a compound query, including at least two tokens of different multimedia data types, with respect to the index to selectively retrieve compound documents that satisfy the compound query by containing data corresponding to the instances of data represented by the tokens included in the compound query.
-
-
6. A computer implemented method for indexing documents, including compound documents, each of the compound documents having at least two different multimedia components, each multimedia component containing data of one of a plurality of multimedia data types, the method comprising:
-
for each document, and for each multimedia component within the document, processing the multimedia component to represent data including non-textual data contained in the multimedia component with at least one token, and for each token, determining reference data descriptive of the data represented by the token and providing a multimedia index comprising a plurality of tokens, each token indexed to a list of documents, each document in the list including data represented by the token. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
processing the multimedia component to identify each instance of a date; and
for each instance of a date, representing the instance of the date with a token which represents a range of dates, and including the date in the reference data for the token representing the range of dates.
-
-
10. The method of claim 6, wherein the multimedia components include text components, and processing a text component comprises:
-
processing the text component to identify each instance of a number; and
for each instance of a number, representing the instance of the number with a token which represents a range of numbers including the value of the number, and including the value of the number in the reference data for the token representing the range of numbers.
-
-
11. The method of claim 6, wherein the multimedia components include image components, and processing an image component comprises:
-
processing the image component to create at least one image histogram having a plurality of bins, each of the plurality of bins having a bin count;
associating at least one of the plurality of bins with a respective token; and
including the bin count of the at least one of the plurality of bins in the reference data for the respective token.
-
-
12. The method of claim 11, wherein the at least one image histogram is selected from the group consisting of color histograms, texture histograms, edge histograms, intensity histogram and gray-scale histograms.
-
13. The method of claim 6, wherein the multimedia components include image components, and processing an image component comprises:
-
representing the image component with a single image token;
processing the image multimedia component to create at least one image histogram; and
including the at least one image histogram in the reference data of the single image token.
-
-
14. The method of claim 6, wherein the multimedia components include audio components containing audio recordings, and processing an audio component comprises:
-
speech recognition processing the audio component to identify spoken words; and
representing the identified spoken words with respective text tokens.
-
-
15. The method of claim 14, further comprising:
including in the reference data for the respective text tokens time offsets of the identified spoken words in the audio component.
-
16. The method of claim 14, further comprising:
-
speech recognition processing the audio component to identify spoken phonemes; and
representing the identified spoken phonemes with respective text tokens.
-
-
17. The method of claim 6, wherein the multimedia components include audio components containing audio data, and processing an audio component comprises:
-
processing the audio component to identify audio content; and
representing the identified audio content with respective tokens descriptive of the audio content.
-
-
18. The method of claim 6, wherein the multimedia components include video components containing video sequences, and processing a video component comprises:
-
extracting a plurality of image frames from the video component; and
processing each of the extracted image frames as an image by associating the image with at least one token representative of non-textual data from the image, and including as reference data for each token a time offset of the image in the video multimedia component.
-
-
19. The method of claim 18, wherein processing a video multimedia component further comprises:
-
extracting audio data from the video multimedia component; and
processing the audio data by associating recognized spoken words in the audio data with text tokens representing the spoken words.
-
-
20. The method of claim 18, wherein processing a video multimedia component further comprises:
extracting text data from the video multimedia component, and processing the text data by associating selected text data with text tokens representing the words.
-
21. The method of claim 6, wherein the multimedia components include video components containing video sequences, and processing a video multimedia component comprises:
-
extracting image frames from the video component, and processing each of the extracted image frames as an image by associating the image with at least one token representative of non-textual data from the image, and including in the reference data for each token a time offset of the image in the video component;
extracting audio data from the video component; and
processing the audio by associating recognized spoken words in the audio with text tokens representing the spoken words;
extracting text data from the video component, and processing the text data by associating selected text data with text tokens representing the words; and
interleaving tokens from the image frames, the audio data and the text data of the video component into a sequence of tokens to represent the video component.
-
-
22. The method of claim 6, further comprising
for each token in the multimedia index, the reference data includes at least one document offset of the data represented by the token. -
23. The method of claim 22 wherein for each token representing data of at least one selected multimedia data type, the reference data further comprise a position of an instance of the data within a multimedia component.
-
24. The method of claim 22, wherein for each token representing audio data, the token in the multimedia index is a text string that represents an instance of a recorded speech within an audio component of a compound document and the reference data includes a time offset of the instance of the recorded speech within the audio component.
-
25. The method of claim 22, wherein the reference data for an audio token includes a confidence factor indicative of the likelihood that the audio token representing a word correctly corresponds to a recorded word in an audio component.
-
26. The method of claim 22, wherein the reference data for a token representing an image includes data from at least one histogram descriptive of visual characteristics of the image.
-
27. The method of claim 22, wherein the reference data for a token representing data in a video multimedia component includes data from at least one histogram of an image from the video multimedia component, and a time offset of the image in the video component.
- 28. The method of clam 6, wherein all of the multimedia data types are represented by a single, common type of tokens in the multimedia index.
-
30. A computer implemented method for retrieving documents, including compound documents, each of the compound documents including at least two different multimedia components, each multimedia component having data of one of a plurality of multimedia data types, the method comprising:
-
receiving a compound query including at least one non-textual multimedia query component;
processing the compound query to generate a query structure comprising a set of tokens, including at least one token representing a non-textual multimedia data type; and
evaluating the query structure with respect to a multimedia index to selectively retrieve compound documents that satisfy the compound query, each retrieved compound document including data represented by at least one token generated from the compound query. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
processing each of the multimedia query components according to its data type to produce at least one token representative of the multimedia query component.
-
-
33. The method of claim 30, further comprising:
for at least one multimedia query component, and for at least one token produced from the multimedia query component, determining reference data comprising information descriptive of the data represented by the token.
-
34. The method of claim 30, further comprising:
for at least one multimedia query component, expanding at least one token produced from the multimedia query component to include in the query structure at least one other token similar to the at least one token.
-
35. The method of claim 34, further comprising:
determining a measure of similarity between the at least one token and the at least one other token for including in reference data for the at least one other token.
-
36. The method of claim 30, wherein the compound query includes an image query component, and processing the image query component comprises:
-
processing the image query component to produce at least one initial token representing at least one image attribute of the image query component; and
expanding the at least one initial token to include in the query structure at least one other token associated with the image attribute of the image query component.
-
-
37. The method of claim 30, wherein the compound query includes an audio query component, and processing the audio query component comprises:
-
processing the audio query component to produce at least one initial token representing an recorded speech in the audio query component;
expanding the at least one initial token to include in the query structure at least one other token representing phonemes of the recorded speech represented by the at least one initial token; and
determining reference data comprising a similarity measure for the at least one other token.
-
-
38. The method of claim 30, wherein the compound query includes an least one text query component, the method further comprising:
for at least one text query component, expanding at least one text token produced from the text query component to include in the query structure at least one image token related to the text query component.
-
39. The method of claim 30, wherein the compound query includes an least one image query component, the method further comprising:
for at least one image query component, expanding at least one image token produced from the image query component to include in the query structure at least one text token related to the image query component.
-
40. The method of claim 30, further comprising:
for at least one query component, expanding a first token produced from the query component to include in the query structure a second token related to the query component and having a different data type the first token.
-
41. The method of claim 30, further comprising:
-
for each multimedia query component, combining the at least one token produced from the multimedia query component into a substructure including the at least one token and at least one query operator; and
combining each of the substructures into a query structure.
-
-
42. The method of claim 30, wherein the multimedia index includes position information, and evaluating the query structure comprising:
-
evaluating the query structure with respect to the multimedia index a first time to selectively retrieve candidate documents that satisfy the compound query by determining a document score for each of a number of documents based only on a presence or absence of tokens from the query structure in the document, and selecting a number of best scoring documents as the candidate documents; and
evaluating the query structure with respect to the multimedia index a second nine by computing a final document score of each candidate document as a function of the position of each token from the query structure in the candidate document, using the position information from the multimedia index.
-
-
43. The method of claim 42, wherein the reference data in the query structure and the reference data in the multimedia index comprise image histograms.
-
44. The method of claim 30, wherein:
-
the multimedia index comprises tokens indexed to a list of documents and reference data associated with each listed document; and
wherein processing the compound query comprises;
determining reference data for at least one of the generated tokens of the query structure; and
including the reference data in the query structure; and
wherein evaluating the query structure comprises;
evaluating the query structure with respect to the multimedia index a first time to selectively retrieve candidate documents that satisfy the compound query by determining a document score for each of a number of documents based only on a presence or absence of tokens from the query structure in the document, and selecting a number of best scoring documents as the candidate documents; and
evaluating the query structure with respect to the multimedia index a second time by computing a final document score of each candidate document based on a comparison of the reference data in the query structure with the reference data in the multimedia index.
-
-
45. A computer implemented method for indexing documents, including compound documents containing both text and at least one image, each image having a position within the document, the method comprising:
-
for each document including at least one image, processing selected images to represent data in the images with at least one image token, and for each image token, determining reference data comprising the position of the image within the document;
for each document including text, processing the text to represent the text with text tokens, and for each text token, determining reference data comprising the position of the text token in the document;
providing a multimedia index comprising a plurality of text tokens and image tokens, each one of the plurality of text tokens and image tokens indexed to a list of documents, each document in the list including data represented by the one of the plurality of text tokens and image tokens; and
for each one of the plurality of text tokens and image tokens, storing in the index the reference data comprising at least one position of the data represented by the one of the plurality of text tokens and image tokens in at least one document.
-
-
46. A computer implemented method for retrieving documents, including compound documents containing both text and at least one image, each image having a predetermined position within the document, the method comprising:
-
receiving a compound query including text and at least one image;
processing the compound query to generate a query structure including at least one token representing the text, at least one token representing the image, and at least one search operator defining a logical or proximity relationship between the text and the image; and
evaluating the query structure with respect to a multimedia index to selectively retrieve compound documents that satisfy the compound query, the multimedia index comprising a plurality of tokens representing texts and images, each of the tokens representing text associated with a text string, each of the tokens representing an image associated with an image feature and with reference data specifying at least a position of the image represented by the token within at least one compound document including the image, each token indexed to a list of documents including the data represented by the token.
-
-
47. A computer implemented method for retrieving documents, including compound documents, each compound document including data of at least two different multimedia data types, the method comprising:
-
receiving a query comprising at least one query component of a multimedia data type;
processing each query component to produce at least one initial token representing data in the query component; and
for at least one query component, and for at least one initial token produced from the query component, expanding the query to include in the query structure at least one other token of a multimedia data type different from that of the initial token, the at least one other token being related to the data in the query component.
-
-
48. An article of manufacture, comprising:
-
a computer readable medium, storing thereon a plurality of modules executable by a processor, and including;
a multimedia component separation module that receives a document, and separates the document into an ordered set of multimedia components;
a first pre-processing module that receives the at least one multimedia component having a first multimedia data type for producing at least one token representing multimedia data in the received multimedia components of the first multimedia data type;
a second pre-processing module that receives the multimedia components having a second multimedia data type different from the first multimedia data type for producing at least one token representing multimedia data in the received multimedia components of the second multimedia data type; and
a multimedia indexer that receives respectively from the first pre-processing module and second pre-processing module tokens, and indexes in a multimedia index each received token to at least one document including data represented by the token. - View Dependent Claims (49, 50, 51, 52)
-
-
53. An article of manufacture for retrieving documents that satisfy a compound query, with readable to a multimedia index, the article comprising:
-
a computer readable medium, storing thereon a plurality of modules executable by a processor, and including;
a query separation module that receives the compound query and separates the query into multimedia query components, each multimedia query component having a data type;
a query pre-processing module that produces a set of query tokens from the multimedia query components;
a query structuring module that structures the set of query tokens into an query structure; and
a query execution module that processes the query structure with respect to a multimedia index to selectively identify documents that satisfy the compound query.
-
-
54. An article of manufacture for indexing documents in a multimedia index, comprising:
-
a computer readable medium, storing thereon a multimedia index comprising a plurality of tokens, each token representing data of one of a plurality of multimedia data types, each token indexed to a list of documents including the data, and storing thereon a software product executable by a processor to perform the operation of;
indexing a plurality of documents, including compound documents, to update the multimedia index by adding tokens, including nontextual tokens, to the multimedia index, each added token representing data of one of the plurality of multimedia data types in at least one of the plurality of documents, each token indexed to a list of documents including the data. - View Dependent Claims (56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66)
separating the document into an ordered plurality of multimedia components, each component having a data type;
processing selected multimedia components to produce at least one token representing data included in the selected multimedia component;
determining reference data for each produced token, the reference data comprising a position of the data represented by the token in the document; and
for each produced token, updating the multimedia index to associate the document with the token, and storing the reference data for the token in association with the token and the document.
-
-
55. The article of manufacture of clam 54, wherein the multimedia index includes tokens representing a plurality of multimedia data types including text data type, image data type, audio data type and video data type.
-
67. An article of manufacture for retrieving documents that satisfy a compound query, with respect to a multimedia index, the article comprising:
-
a computer readable medium, storing thereon the multimedia index comprising a plurality of tokens, each token representing data of one of at plurality of multimedia data types, each token indexed to a list of documents including the data represented by the token, and storing thereon a software product executable by a processor to perform the operation of;
processing the compound query, including at least two query components of different multimedia data types, with respect to the multimedia index to selectively retrieve compound documents that satisfy the compound query. - View Dependent Claims (68)
separating the query into query components, each component having a multimedia data type;
processing selected query components according to its multimedia data type to produce at least one token representative of data in the multimedia component;
combining the produced tokens with query operators to form a query structure; and
processing the query structure with respect to the multimedia index to selectively retrieve documents satisfying the compound query, the selected documents including data represented by the tokens of the compound query.
-
Specification