×

Information retrieval engine

  • US 7,720,852 B2
  • Filed: 06/22/2006
  • Issued: 05/18/2010
  • Est. Priority Date: 05/03/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method comprising:

  • accepting, by at least one processing unit, a file and information corresponding to the file, the file comprising content and the information corresponding to the file comprising metadata;

    associating, by the at least one processing unit, the file and the information corresponding to the file;

    organizing, by the at least one processing unit, the file to form at least one document comprising at least a portion of the content of the file;

    associating, by the at least one processing unit, the file and the document corresponding to the file;

    quantizing, by the at least one processing unit, the document'"'"'s content to obtain letters;

    grouping, by the at least one processing unit, the letters to form a set of words, the set being based on predetermined frequency of occurrence threshold and frequencies of occurrence of words formed from the letters;

    associating, by the at least one processing unit, each document and the corresponding set of words in an index of documents, the index corresponding to a plurality of files, including the accepted file, each file of the plurality having corresponding metadata and each file being organized to form at least one of the documents indexed, each document indexed having the set of words formed from the document'"'"'s content;

    obtaining, by the at least one processing unit, a set of query words formed from content of a query, the obtaining further comprises;

    receiving the query;

    quantizing the content of the query to output a series of letters;

    grouping the letters to form the set of query words based on a predetermined frequency of the occurrence of the grouped letters; and

    weighting the query using a local weighting factor, a global weighting factor, and a normalization factor;

    identifying, by the at least one processing unit, one or more documents in the index, each of the identified documents containing at least one query word in the set of query words;

    scoring, by the at least one processing unit, each of the identified documents, a score for each identified document being based at least in part on a weighting of each query word found in the identified document, the weighting being determined using the local weighting factor and the global weighting factor; and

    selecting, by the at least one processing unit, the metadata of an identified file as metadata for the content of the query, the identified file being identified from the plurality of files using the identified documents'"'"' scores.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×