Realtime indexing and search in large, rapidly changing document collections
First Claim
Patent Images
1. A computerized method for indexing content items, the method comprising:
- generating, using a processor, an inverted index of word location pairs that identifies the location of one or more words in one or more content items available on a network;
storing the inverted index in an index data store;
dynamically receiving one or more additional content items over the network;
prior to elapsing of a predetermined time threshold, storing the one or more additional content items in a stream search queue, the stream search queue operative to allow for a stream search of the one or more additional content items;
once the time threshold elapses, indexing, using the processor, the one or more additional content items in the stream search queue and then writing the indexed content from the stream search queue into the inverted index;
receiving a query from a user, the query comprising one or more query executing a stream search of the stream search queue to identify a given one of the query terms and to generate a stream search result set;
executing an index search of the inverted index of word location pairs to identify a given one of the query terms and generate an index result set; and
generating a merge result set on the basis of the stream result set and the index result set.
9 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed to systems and methods for searching content items indexed in real-time. The method according to one embodiment comprises generating an index of word location pairs that identifies the location of one or more words in one or more content items available on a network. One or more additional content items are received over the network. The received content items are stored in a stream search queue, the stream search queue operative to allow for a stream search of the one or more additional content items.
-
Citations
12 Claims
-
1. A computerized method for indexing content items, the method comprising:
-
generating, using a processor, an inverted index of word location pairs that identifies the location of one or more words in one or more content items available on a network; storing the inverted index in an index data store; dynamically receiving one or more additional content items over the network; prior to elapsing of a predetermined time threshold, storing the one or more additional content items in a stream search queue, the stream search queue operative to allow for a stream search of the one or more additional content items; once the time threshold elapses, indexing, using the processor, the one or more additional content items in the stream search queue and then writing the indexed content from the stream search queue into the inverted index; receiving a query from a user, the query comprising one or more query executing a stream search of the stream search queue to identify a given one of the query terms and to generate a stream search result set; executing an index search of the inverted index of word location pairs to identify a given one of the query terms and generate an index result set; and generating a merge result set on the basis of the stream result set and the index result set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus for indexing content items, the apparatus comprising:
-
a processor, in response to executable instructions from a computer readable medium, operative to; generate an inverted index of word location pairs that identifies the location of one or more words in one or more content items available on a network; store the inverted index in an index data store; dynamically receive one or more additional content items over the network; prior to elapsing of a predetermined time threshold, store the one or more additional content items in a stream search queue, the stream search queue operative to allow for a stream search of the one or more additional content items; once the time threshold elapses, index the one or more additional content items in the stream search queue and then writing the indexed content from the stream search queue into the inverted index; receiving a query from a user, the query comprising one or more query terms; executing a stream search of the stream search queue to identify a given one of the query terms and to generate a stream search result set; executing an index search of the inverted index of word location pairs to identify a given one of the query terms and generate an index result set; and generating a merge result set on the basis of the stream result set and the index result set.
-
Specification