CONTENT DATA INDEXING AND RESULT RANKING

US 20070282831A1
Filed: 08/20/2007
Published: 12/06/2007
Est. Priority Date: 07/01/2002
Status: Active Grant

First Claim

Patent Images

1. In a computing system having access to multiple content entities, each content entity including searchable content, a method for building a searchable content index for searching and retrieving content entities in an efficient manner that returns results of content entities expected to be found, the method comprising:

identifying searchable data within each of a plurality of content entities;

dividing text portions of the searchable data within each of the plurality of content entities into words and tokens, and storing each of the words and tokens in a table;

removing from the table each duplicate word and token; and

applying an alternative word set to the table after each duplicate word and token has been removed, wherein applying the alternative word set to the table includes adding to the table alternative words associated with one or more of the words or tokens in the table.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A full text indexing system is provided for processing content associated with data applications such as encyclopedia and dictionary applications. A build process collects data from various sources, processes the data into constituent parts, including alternative word sets, and stores the constituent parts in structured database tables. A run-time process is used to query the database tables and the results in order to provide effective matches in an efficient manner. Run-time processing is optimized by preprocessing all steps that are query-independent during the build process. A double word table representing all possible word pair combinations for each index entry and an alternative word table are used to further optimize runtime processing.

Citations

20 Claims

1. In a computing system having access to multiple content entities, each content entity including searchable content, a method for building a searchable content index for searching and retrieving content entities in an efficient manner that returns results of content entities expected to be found, the method comprising:
- identifying searchable data within each of a plurality of content entities;
  
  dividing text portions of the searchable data within each of the plurality of content entities into words and tokens, and storing each of the words and tokens in a table;
  
  removing from the table each duplicate word and token; and
  
  applying an alternative word set to the table after each duplicate word and token has been removed, wherein applying the alternative word set to the table includes adding to the table alternative words associated with one or more of the words or tokens in the table.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A method as recited in claim 1, wherein identifying the searchable data comprises:
    - using a content management system to identify a data source.
  - 3. A method as recited in claim 2, wherein the data source is an encyclopedic data source.
  - 4. A method as recited in claim 1, wherein each token comprises at least two words representing words commonly found together.
  - 5. A method as recited in claim 1, wherein dividing text portions of the searchable data within each of the plurality of content entities into words and tokens is performed by a parsing module incorporating a natural language parser.
  - 6. A method as recited in claim 1, wherein applying an alternative word set to the table after each duplicate word and token has been removed comprises:
    - identifying alternative words for words and tokens in the table.
  - 7. A method as recited in claim 6, wherein the identified alternative words include synonyms of the words and tokens in the table.
  - 8. A method as recited in claim 6, wherein the identified alternative words include common misspellings of the words and tokens in the table.
  - 9. A method as recited in claim 6, wherein the identified alternative words include common related phrases of the words and tokens in the table.
  - 10. A method as recited in claim 1, wherein applying the alternative word set to the table includes:
    - within the table, creating an alt word table; and
      
      identifying one or more associations between different content entities and, separate from the alt word table, creating in the table one or more associations between the different content entities.

11. In a computing system having access to multiple content entities, each content entity including searchable content, a method for performing a run-time search of the searchable content to identify a ranked list of relevant content entities, the method comprising:
- receiving, from a user of a client application, a query that includes one or more target search terms, the one or more target search terms being provided in a natural word format;
  
  translating the query received from the user into a database query conducive to a known architecture of a database associated with searchable content;
  
  querying the database by comparing the database query to the database, thereby generating a list of content entities that are potential matches;
  
  ranking the content entities in the list in descending order, based on a calculated likelihood that a particular entity is a target of the query from the user;
  
  removing from the ranking any content entities that are duplicates; and
  
  returning, to the client application, a list of content entities with the highest ranking.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. A method as recited in claim 11, wherein the client application is an encyclopedia application.
  - 13. A method as recited in claim 11, wherein the client application is a dictionary application.
  - 14. A method as recited in claim 11, wherein the database includes a search index table associated with searchable content.
  - 15. A method as recited in claim 14, wherein the search index table includes:
    - a search content table listing all content entities that have been indexed; and
      
      a search content word table listing all words for each entry in the search content table, in a predefined order and without duplicates;
  - 16. A method as recited in claim 15, wherein the search index table further includes:
    - a search content double word table listing all words for each entry in the search content table, in unordered, unique pairs within a single search content table entry.
  - 17. A method as recited in claim 15, wherein the search index table further includes:
    - a search word table listing all unique words and excluding all stop words.
  - 18. A method as recited in claim 17, wherein the search word table includes an alt word table includes an alt word table listing words similar to one or more words or tokens in target search terms.
  - 19. A method as recited in claim 11, wherein returning, to the client application, a list of content entities with the highest ranking comprises:
    - returning to the client a list of only those content entities with a ranking above a predetermined threshold value.
  - 20. A computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, cause a computing system to perform the method of claim 11.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Zhigu Holdings Limited
Original Assignee
Microsoft Corporation
Inventors
Jayanti, Harish, Anderson, Christopher

Granted Patent

US 7,987,189 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/48   Retrieval characterised by ...

Y10S 707/917   Text

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99942   Manipulating data structure...

Y10S 707/99943   Generating database or data...

Y10S 707/99944   Object-oriented database st...

CONTENT DATA INDEXING AND RESULT RANKING

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

CONTENT DATA INDEXING AND RESULT RANKING

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links