Information retrieval engine

US 7,720,852 B2
Filed: 06/22/2006
Issued: 05/18/2010
Est. Priority Date: 05/03/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method comprising:

accepting, by at least one processing unit, a file and information corresponding to the file, the file comprising content and the information corresponding to the file comprising metadata;

associating, by the at least one processing unit, the file and the information corresponding to the file;

organizing, by the at least one processing unit, the file to form at least one document comprising at least a portion of the content of the file;

associating, by the at least one processing unit, the file and the document corresponding to the file;

quantizing, by the at least one processing unit, the document'"'"'s content to obtain letters;

grouping, by the at least one processing unit, the letters to form a set of words, the set being based on predetermined frequency of occurrence threshold and frequencies of occurrence of words formed from the letters;

associating, by the at least one processing unit, each document and the corresponding set of words in an index of documents, the index corresponding to a plurality of files, including the accepted file, each file of the plurality having corresponding metadata and each file being organized to form at least one of the documents indexed, each document indexed having the set of words formed from the document'"'"'s content;

obtaining, by the at least one processing unit, a set of query words formed from content of a query, the obtaining further comprises;

receiving the query;

quantizing the content of the query to output a series of letters;

grouping the letters to form the set of query words based on a predetermined frequency of the occurrence of the grouped letters; and

weighting the query using a local weighting factor, a global weighting factor, and a normalization factor;

identifying, by the at least one processing unit, one or more documents in the index, each of the identified documents containing at least one query word in the set of query words;

scoring, by the at least one processing unit, each of the identified documents, a score for each identified document being based at least in part on a weighting of each query word found in the identified document, the weighting being determined using the local weighting factor and the global weighting factor; and

selecting, by the at least one processing unit, the metadata of an identified file as metadata for the content of the query, the identified file being identified from the plurality of files using the identified documents'"'"' scores.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method, and computer program product retrieve information associated with the signals. The information retrieval can be performed on a signal by quantizing the signal, forming words, and indexing based on weights of the words. The words are formed by grouping letters together to form a number of words within predetermined threshold values. The weights of the words are determined using a binomial log likelihood ratio analysis. The present invention may be applied to identification of an unknown song.

215 Citations

33 Claims

1. A computer-implemented method comprising:
- accepting, by at least one processing unit, a file and information corresponding to the file, the file comprising content and the information corresponding to the file comprising metadata;
  
  associating, by the at least one processing unit, the file and the information corresponding to the file;
  
  organizing, by the at least one processing unit, the file to form at least one document comprising at least a portion of the content of the file;
  
  associating, by the at least one processing unit, the file and the document corresponding to the file;
  
  quantizing, by the at least one processing unit, the document'"'"'s content to obtain letters;
  
  grouping, by the at least one processing unit, the letters to form a set of words, the set being based on predetermined frequency of occurrence threshold and frequencies of occurrence of words formed from the letters;
  
  associating, by the at least one processing unit, each document and the corresponding set of words in an index of documents, the index corresponding to a plurality of files, including the accepted file, each file of the plurality having corresponding metadata and each file being organized to form at least one of the documents indexed, each document indexed having the set of words formed from the document'"'"'s content;
  
  obtaining, by the at least one processing unit, a set of query words formed from content of a query, the obtaining further comprises;
  
  receiving the query;
  
  quantizing the content of the query to output a series of letters;
  
  grouping the letters to form the set of query words based on a predetermined frequency of the occurrence of the grouped letters; and
  
  weighting the query using a local weighting factor, a global weighting factor, and a normalization factor;
  
  identifying, by the at least one processing unit, one or more documents in the index, each of the identified documents containing at least one query word in the set of query words;
  
  scoring, by the at least one processing unit, each of the identified documents, a score for each identified document being based at least in part on a weighting of each query word found in the identified document, the weighting being determined using the local weighting factor and the global weighting factor; and
  
  selecting, by the at least one processing unit, the metadata of an identified file as metadata for the content of the query, the identified file being identified from the plurality of files using the identified documents'"'"' scores.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
- - 2. The computer-implemented method of claim 1, wherein the file is an audio file.
  - 3. The computer-implemented method of claim 2, wherein the audio file is a song.
  - 4. The computer implemented method of claim 2, wherein the audio file is a compressed song.
  - 5. The computer-implemented method of claim 2, wherein the audio file is a portion of a song.
  - 6. The computer-implemented method of claim 1, wherein the grouping the letters comprises:
    - searching for frequently appearing letters; and
      
      searching for frequently appearing n-grams, where n is any integer.
  - 7. The computer-implemented method of claim 6, wherein the frequently is determined by predetermined threshold limits.
  - 8. The computer-implemented method of claim 1, wherein the quantizing is performed using an Expectation-Maximization algorithm.
  - 9. The computer-implemented method of claim 1, wherein the quantizing is performed using an incremental algorithm.
  - 10. The computer-implemented method of claim 1, wherein the quantizing is performed using a Gaussian mixture algorithm.
  - 11. The computer-implemented method of claim 1, the obtaining the set of query words further comprising separating the received query to form a plurality of documents.
  - 12. The computer-implemented method of claim 11, wherein each of the plurality of formed documents and each document in the index is the same length.
  - 13. The computer-implemented method of claim 11, wherein the length is thirty seconds in length.
  - 14. The computer-implemented method of claim 13, wherein each one of the formed documents document overlaps with an adjacent one of the formed documents.
  - 15. The computer-implemented method of claim 14, wherein the overlap comprises a twenty-five second overlap.
  - 16. The computer-implemented method of claim 14, wherein the overlap comprises a fifteen-second overlap.
  - 17. The computer-implemented method of claim 1, wherein the normalization factor is n c=1 j .times. (l j) 2 .times. (g ij) 2, where l.sub.j represents the number of times a word appears in the query and g.sub.ij represents the number of times a word appears in the documents in the index.
  - 18. The computer-implemented method of any of claims 1 and 17, wherein the local weighting factor is k.sub.ij, where k.sub.ij represents the number of query words in the set of query words.
  - 19. The computer-implemented method of any of claims 1 and 17, wherein the local weighting factor is log k.sub.ij, where k.sub.ij represents the number of query words in the set of query words.
  - 20. The computer-implemented method of any of claims 1 and 17, wherein the local weighting factor is one.
  - 21. The computer-implemented method of any of claims 1 and 17, wherein the global weighting factor is log .times. N+1 DF j+1, wherein N represents the total number of documents and DF represents the document frequency.
  - 22. The computer-implemented method of any of claims 1 and 17, wherein the global weighting factor is one.
  - 23. The computer-implemented method of claim 1, wherein the scoring each of the identified documents further comprises scoring each of the identified documents using the normalization factor.
  - 24. The computer-implemented method of claim 23, wherein the normalization factor is $n_{c} = \frac{1}{\sqrt{\sum}}$
    - j ⁢
      
      ( l j ) 2 ⁢
      
      ( g ij ) 2 where l_jrepresents the number of times a word appears in the query and g_ijrepresents the number of times a word appears in the documents in the index.
  - 25. The computer-implemented method of any one of claims 1, 23 and 24, wherein the local weighting factor is k_ij, where k_ijrepresents the number of query words in the set of query words.
  - 26. The computer-implemented method of any one of claims 1, 23 and 24, wherein the local weighting factor is log k_ij, where k_ijrepresents the number of query words in the query.
  - 27. The computer-implemented method of any one of claims 1, 23 and 24, wherein the local weighting factor is one.
  - 28. The computer-implemented method of anyone of claims 1, 23 and 24, wherein the global weighting factor is $\log$
    - N + 1 DF j + 1 , wherein N represents the total number of documents and DF represents the document frequency.
  - 29. The computer-implemented method of anyone of claims 1, 23 and 24, wherein the global weighting factor is one.
  - 30. The computer-implemented method claim 1, wherein the at least one document into which the file is organized comprises multiple documents having the same length.
  - 31. The computer-implemented method of claim 30, wherein the length comprises thirty seconds.
  - 32. The computer-implemented method of claim 30, wherein the at least one document comprises multiple documents each one of the multiple documents overlapping with another one of the multiple documents.
  - 33. The computer-implemented method of claim 32, wherein the overlap comprises a twenty-five second overlap.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Dunning, Ted E.
Primary Examiner(s)
Jalil; Neveen Abel
Assistant Examiner(s)
Vu; Bai D

Application Number

US11/472,792
Publication Number

US 20060242193A1
Time in Patent Office

1,426 Days
Field of Search

None
US Class Current

707/750
CPC Class Codes

G06F 16/3334   Selection or weighting of t...

G06F 16/634   Query by example, e.g. quer...

G06F 16/683   using metadata automaticall...

Y10S 707/916   Audio

Y10S 707/99943   Generating database or data...

Information retrieval engine

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

215 Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

Information retrieval engine

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

215 Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links