CONTENT DATA INDEXING AND RESULT RANKING
First Claim
1. In a computing system having access to multiple content entities, each content entity including searchable content, a method for building a searchable content index for searching and retrieving content entities in an efficient manner that returns results of content entities expected to be found, the method comprising:
- identifying searchable data within each of a plurality of content entities;
dividing text portions of the searchable data within each of the plurality of content entities into words and tokens, and storing each of the words and tokens in a table;
removing from the table each duplicate word and token; and
applying an alternative word set to the table after each duplicate word and token has been removed, wherein applying the alternative word set to the table includes adding to the table alternative words associated with one or more of the words or tokens in the table.
3 Assignments
0 Petitions
Accused Products
Abstract
A full text indexing system is provided for processing content associated with data applications such as encyclopedia and dictionary applications. A build process collects data from various sources, processes the data into constituent parts, including alternative word sets, and stores the constituent parts in structured database tables. A run-time process is used to query the database tables and the results in order to provide effective matches in an efficient manner. Run-time processing is optimized by preprocessing all steps that are query-independent during the build process. A double word table representing all possible word pair combinations for each index entry and an alternative word table are used to further optimize runtime processing.
-
Citations
20 Claims
-
1. In a computing system having access to multiple content entities, each content entity including searchable content, a method for building a searchable content index for searching and retrieving content entities in an efficient manner that returns results of content entities expected to be found, the method comprising:
-
identifying searchable data within each of a plurality of content entities;
dividing text portions of the searchable data within each of the plurality of content entities into words and tokens, and storing each of the words and tokens in a table;
removing from the table each duplicate word and token; and
applying an alternative word set to the table after each duplicate word and token has been removed, wherein applying the alternative word set to the table includes adding to the table alternative words associated with one or more of the words or tokens in the table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. In a computing system having access to multiple content entities, each content entity including searchable content, a method for performing a run-time search of the searchable content to identify a ranked list of relevant content entities, the method comprising:
-
receiving, from a user of a client application, a query that includes one or more target search terms, the one or more target search terms being provided in a natural word format;
translating the query received from the user into a database query conducive to a known architecture of a database associated with searchable content;
querying the database by comparing the database query to the database, thereby generating a list of content entities that are potential matches;
ranking the content entities in the list in descending order, based on a calculated likelihood that a particular entity is a target of the query from the user;
removing from the ranking any content entities that are duplicates; and
returning, to the client application, a list of content entities with the highest ranking. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification