×

Document tagging and retrieval using per-subject dictionaries including subject-determining-power scores for entries

  • US 9,971,828 B2
  • Filed: 10/20/2015
  • Issued: 05/15/2018
  • Est. Priority Date: 05/10/2013
  • Status: Active Grant
First Claim
Patent Images

1. A computer-performed method of organizing a collection of electronic documents, the method comprising:

  • in a computer system, storing entries in multiple dictionaries separate from and not associated with any particular one of the electronic documents, wherein the multiple dictionaries are data structures within the computer system, wherein individual ones of the multiple dictionaries correspond to one of a plurality of different subjects, wherein the entries contain a descriptive term and a subject-determining-power score corresponding to the descriptive term, wherein an individual subject-determining-power score indicates the relative strength or weakness of the corresponding descriptive term with respect to the subject of the one of the multiple dictionaries containing an entry in which the descriptive term is stored, and wherein at least some of the descriptive terms are present in two or more of the multiple dictionaries;

    responsive to requests within the computer system to identify one or more of the electronic documents, wherein the requests include at least one search term descriptive of the one or more electronic documents, accessing the collection of electronic documents by matching the at least one search term with descriptive terms in the multiple dictionaries to determine one or more subjects of the request from subjects of one or more of the multiple dictionaries that contain the descriptive terms matching the at least one search term, and applying the subject-determining power scores to determine which of the subjects of the one or more of the multiple dictionaries that contain the descriptive terms that match the at least one search term are most applicable to the request;

    ranking the one or more subjects determined by the matching according to a match score computed for individual ones of the one or more subjects with respect to the at least one search term;

    until a predetermined number of documents are identified, collecting document tags having a best match to the highest-ranking subject for which the document tags have not yet been collected, in a collected set of document tags for the request, wherein the collecting document tags collects the document tags from a tag database separate from the documents and the dictionaries, whereby a speed of matching the documents to the one or more subjects is increased, wherein the document tags include for each of one or more subject entries in the document tags, multiple tag terms with subject-power determining scores corresponding to the tag terms and a confidence score, wherein the collecting document tags further determines the best-match to the highest-ranking subject for which the document tags have not been collected by multiplying all of the tag terms in the entry corresponding to the highest-ranking subject for which the document tags have not been collected by the confidence score of the entry to generate a document subject match score, and compare the computed document subject match scores to determine the best match; and

    storing a representation of the collected set of document tags that identify the electronic documents in a memory of the computer system to provide the response to the request to identify the one or more electronic documents.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×