System and methods for ranking documents based on content characteristics

US 8,370,347 B1
Filed: 02/17/2012
Issued: 02/05/2013
Est. Priority Date: 03/16/2009
Status: Expired due to Fees

First Claim

Patent Images

1. A system for assessing information in natural language contents, comprising:

a computer processing system configured to receive, from a user interface, an object name as a query term from a user; and

a computer storage configured to store an object-specific data set related to the object name and to store a plurality of documents containing text in a natural language, wherein the object-specific data set includes a plurality of property names and association-strength values, each property name being associated with an association-strength value, wherein the association strength values of the plurality of property names are above a predetermined threshold value, wherein the plurality of property names includes a first property name and a second property name,wherein the computer processing system is configured to count a first frequency of the first property name in one of the plurality of documents, to count a second frequency of the second property name in the one of the plurality of documents, to calculate a relevance score as a function of the first frequency and the second frequency, to rank the plurality of documents using their respective relevance scores, and to return one or more documents to the user interface based on the ranking of the plurality of documents.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system is described for assessing information in natural language contents. A user interface receives an object name as a query term and a value for a customized ranking parameter from a user. A computer storage device stores an object-specific data set related to the object name, wherein the object-specific data set includes a plurality of property names and association-strength values. A computer processing system can count a first frequency of a first property name and count a second frequency of a second property name in a document containing text in a natural language, calculate a relevance score as a function of the first frequency and the second frequency, and rank the plurality of documents using their respective relevance scores, and return one or more documents to the user based on the ranking of the plurality of documents. The function is in part defined by the customized ranking parameter.

12 Citations

View as Search Results

20 Claims

1. A system for assessing information in natural language contents, comprising:
- a computer processing system configured to receive, from a user interface, an object name as a query term from a user; and
  
  a computer storage configured to store an object-specific data set related to the object name and to store a plurality of documents containing text in a natural language, wherein the object-specific data set includes a plurality of property names and association-strength values, each property name being associated with an association-strength value, wherein the association strength values of the plurality of property names are above a predetermined threshold value, wherein the plurality of property names includes a first property name and a second property name,wherein the computer processing system is configured to count a first frequency of the first property name in one of the plurality of documents, to count a second frequency of the second property name in the one of the plurality of documents, to calculate a relevance score as a function of the first frequency and the second frequency, to rank the plurality of documents using their respective relevance scores, and to return one or more documents to the user interface based on the ranking of the plurality of documents.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein the computer processing system is configured to receive a value for a customized ranking parameter from the user interface, wherein the computer processing system is configured to calculate the relevance score as a function of the first frequency, the second frequency, and the customized ranking parameter.
  - 3. The system of claim 2, wherein the computer processing system is configured to receive, from the user interface, user'"'"'s preference for different types of contents in the one or more documents to be returned by the computer processing system, wherein the preferences for the different types of contents comprise at least one of “
    - general”
      
      , “
      
      specific”
      
      , “
      
      detailed”
      
      , “
      
      brief”
      
      , “
      
      query-specific”
      
      , or “
      
      conceptual”
      
      .
  - 4. The system of claim 1, wherein the function depends on the sum of a first multiplication of the first frequency and its corresponding association-strength value and a second multiplication of the second frequency and its corresponding association-strength value.
  - 5. The system of claim 1, wherein the computer processing system is further configured to:
    - count the frequencies of the plurality of property names in the one of the plurality of documents;
      
      sum the frequencies of property names to produce a total count in the one of the plurality of documents; and
      
      calculate the relevance score for the one of the plurality of documents using the total count.
  - 6. The system of claim 5, wherein the frequencies of the plurality of property names is counted up to a predetermined upper bound, wherein the relevance score is calculated using a ratio of the total count to the predetermined upper bound, wherein the total count is set to be equal to the predetermined upper bound if the total frequencies of the plurality of property names exceed the predetermined upper bound.
  - 7. The system of claim 1, wherein the first property name is the object name, wherein the second property name is different from the object name, the computer processing system is further configured to sum the frequencies of property names that are different from the object name in the document to produce a total count, wherein the relevance score is calculated using a ratio of the first frequency to the sum of the first frequency and the total count.

8. A system for assessing information in natural language contents, comprising:
- a computer processing system configured to receive, from a user interface, an object name as a query term from a user; and
  
  a computer storage configured to store an object-specific data set related to the object name and to store a plurality of documents containing text in a natural language, wherein the object-specific data set includes a plurality of property names and association-strength values, each property name being associated with an association-strength value,wherein the computer processing system is configured to separate the plurality of property names in the object-specific data set into a first group and a second group, wherein the first group of one or more property names have their respective association strength values at or above a predetermined value, wherein the second group of one or more property names have their respective association strength values below the predetermined value, wherein the computer processing system is configured to count the frequencies of the property names in the first group and count the frequencies of the property names in the second group in each of the plurality of documents, wherein the computer processing system is configured to calculate a relevance score as a function of the frequencies of the property names in the first group and the frequencies of property names in the second group, wherein the computer processing system is configured to rank the plurality of documents using their respective relevance scores and to return one or more documents to the user interface based on the ranking of the plurality of documents.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
- - 9. The system of claim 8, wherein the association strength values of the plurality of property names are above a threshold value.
  - 10. The system of claim 9, wherein the computer processing system is configured to receive a value for a customized ranking parameter from the user interface, wherein the computer processing system is configured to calculate the relevance score as a function of the frequencies of the property names in the first group, the frequencies of property names in the second group, and the customized ranking parameter.
  - 11. The system of claim 8, wherein the computer processing system is configured to sum the frequencies of the one or more property names in the first group in the document to produce a first sum and to sum the frequencies of the one or more property names in the second group in the document to produce a second sum, where the relevance score is calculated using the first sum and the second sum.
  - 12. The system of claim 11, wherein the relevance score is calculated using the ratio of the first sum to the sum of the first sum and the second sum.
  - 13. The system of claim 8, wherein the function depends on the sum of a first multiplication of the frequencies of the property names and their corresponding association-strength values in the first group, and wherein the function depends on the sum of a second multiplication of the frequencies of the property names and their corresponding association-strength values in the second group.
  - 14. The system of claim 8, wherein the computer processing system is further configured to:
    - count the frequencies of the plurality of property names in the one of the plurality of documents;
      
      sum the frequencies of property names to produce a total count in the one of the plurality of documents; and
      
      calculate the relevance score for the one of the plurality of documents using the total count.
  - 15. The system of claim 14, wherein the frequencies of the plurality of property names is counted up to a predetermined upper bound, wherein the relevance score is calculated using a ratio of the total count to the predetermined upper bound, wherein the total count is set to be equal to the predetermined upper bound if the total frequencies of the plurality of property names exceed the predetermined upper bound.

16. A method for assessing information in natural language contents, comprising:
- receiving an object name as a query term from a user interface by a computer processing system;
  
  retrieving an object-specific data set related to the object name from a computer storage system, wherein the object-specific data set includes a plurality of property names and association-strength values, each property name being associated with an association-strength value, wherein the association strength values of the plurality of property names are above a predetermined threshold value, wherein the plurality of property names includes a first property name and a second property name;
  
  retrieving, by the computer processing system, a plurality of documents containing text in a natural language;
  
  counting a first frequency of the first property name in one of the plurality of documents by the computer processing system;
  
  counting a second frequency of the second property name in the in one of the plurality of documents by the computer processing system;
  
  calculating a relevance score as a function of the first frequency and the second frequency;
  
  ranking the plurality of documents using their respective relevance scores; and
  
  returning one or more documents to the user interface based on the ranking of the plurality of documents.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16, further comprising:
    - receiving a value for a customized ranking parameter from a user by the computer processing system; and
      
      calculating the relevance score as a function of the first frequency, the second frequency, and the customized ranking parameter.
  - 18. The method of claim 17, further comprising:
    - receiving, from the user interface by the computer processing system, user'"'"'s preference for different types of contents in the one or more documents to be returned by the computer processing system, wherein the preferences for the different types of contents comprise at least one of “
      
      general”
      
      , “
      
      specific”
      
      , “
      
      detailed”
      
      , “
      
      brief”
      
      , “
      
      query-specific”
      
      , or “
      
      conceptual”
      
      .
  - 19. The method of claim 16, wherein the function depends on the sum of a first multiplication of the first frequency and its corresponding association-strength value and a second multiplication of the second frequency and its corresponding association-strength value.
  - 20. The method of claim 16, further comprising:
    - separating the plurality of property names in the object-specific data set into a first group comprising the first property name and a second group comprising the second property name, wherein the property names in the first group have their respective association strength values at or above a predetermined value, wherein the property names in the second group have their respective association strength values below the predetermined value;
      
      counting, in the one of the plurality of documents, the frequencies of the property names in the first group; and
      
      counting, in the one of the plurality of documents, the frequencies of the property names in the second group, wherein the relevance score is calculated using the frequencies of the property names in the first group and the frequencies of property names in the second group.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Linfo IP LLC (Pueblo Nuevo LLC)
Original Assignee
Guangsheng Zhang
Inventors
Zhang, Guangsheng
Primary Examiner(s)
MOSER, BRUCE M

Application Number

US13/399,050
Time in Patent Office

354 Days
Field of Search

707/999.003, 707/999.006, 707/999.007, 707/999.102, 707/999.107, 707/730, 707/728, 707/748, 707/750, 707/758, 704/9
US Class Current

707/730
CPC Class Codes

G06F 16/3346   using probabilistic model

G06F 16/93   Document management systems

G06F 16/951   Indexing; Web crawling tech...

System and methods for ranking documents based on content characteristics

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

12 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and methods for ranking documents based on content characteristics

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

12 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links