Information retrieval engine
First Claim
Patent Images
1. The computer-implemented method comprising:
- providing an index of one or more files, the index associating each file with information corresponding to the file and one or more documents, each document containing one or more associated words;
accepting a query containing a file;
quantizing the file to obtain letters;
grouping the letters to form a set of words, the set being based on frequency of the occurrence of the grouped letters;
weighting each word in the set of words, such that the weighting of each word in the set is determined using a local weighting factor and a global weighting factor;
searching the index for at least one document containing at least one of the words in the set;
scoring each document in the index containing at least one of the words in the set; and
identifying the file corresponding to the document with the highest score.
10 Assignments
0 Petitions
Accused Products
Abstract
A system, method, and computer program product retrieve information associated with the signals. The information retrieval can be performed on a signal by quantizing the signal, forming words, and indexing based on weights of the words. The words are formed by grouping letters together to form a number of words within predetermined threshold values. The weights of the words are determined using a binomial log likelihood ratio analysis. The present invention may be applied to identification of an unknown song.
226 Citations
89 Claims
-
1. The computer-implemented method comprising:
-
providing an index of one or more files, the index associating each file with information corresponding to the file and one or more documents, each document containing one or more associated words; accepting a query containing a file; quantizing the file to obtain letters; grouping the letters to form a set of words, the set being based on frequency of the occurrence of the grouped letters; weighting each word in the set of words, such that the weighting of each word in the set is determined using a local weighting factor and a global weighting factor; searching the index for at least one document containing at least one of the words in the set; scoring each document in the index containing at least one of the words in the set; and identifying the file corresponding to the document with the highest score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A computer-implemented method of retrieving information from a signal quantized into clusters of data comprising:
-
accepting a query including at least a subset of the clusters of data; organizing the clusters into words based on frequency of occurrence of the clusters; searching an index for the words in query, the index comprising; a plurality of known signals; a plurality of known information corresponding to the known signals; a plurality of corresponding clusters of data; a plurality of corresponding documents organized into words; weighting the words in the query, such that the weighting of each word in the query is determined using a local weighting factor and a global weighting factor; scoring the documents in the index containing the words in the query; and retrieving information associated with the known document having the highest score. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. A system comprising:
-
a query input device for accepting a query including clusters of data representing an unknown signal; an organization module, coupled to the query input device, for receiving letters as input and organizing the clusters into a set of words based on frequency of occurrence of the words; an index, coupled to the organization module, comprising; a plurality of known signals; a plurality of known information corresponding to the known signals; a plurality of corresponding clusters of data organized into words; a plurality of corresponding documents; a search module, coupled to the index, for searching the index for a document containing a word in the query; a weight module, coupled to the organization module, for receiving one word in the set of words as input and determining a weight associated with the word, such that the weight module uses a local weighting factor and a global weighting factor to determine the weight associated with the word; a score module, coupled to the search module, for receiving the document containing the word in the query as input and determining a score associated with the document; and an information retrieval engine coupled to the score module for retrieving the information corresponding to the known document with the highest score determined by the score module. - View Dependent Claims (46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67)
-
-
68. A computer-readable medium comprising computer-readable code, comprising:
-
computer-readable code adapted to accept a query including at least a subset of the clusters of data; computer-readable code adapted to organize the clusters of data into words based on frequency of occurrence of the words; computer-readable code adapted to search an index for the words in the query, the index comprising; a plurality of known signals; a plurality of known information corresponding to the known signals; a plurality of corresponding clusters of data organized into words; a plurality of corresponding documents; computer-readable code adapted to weight the words in the query and to score the documents in the index which contain the words in the query, such that a local weighting factor and a global weighting factor are used to weight the words in the query, to score the documents in the index, or both; and computer-readable code adapted to retrieve information associated with the known document with the highest score. - View Dependent Claims (69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89)
-
Specification