Efficient forward ranking in a search engine

US 10,437,892 B2
Filed: 07/08/2014
Issued: 10/08/2019
Est. Priority Date: 11/22/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for using a forward index to extract information for ranking documents based on a search query, the method comprising:

receiving a search query;

parsing the search query to identify one or more atoms;

creating a token map of query tokens using the one or more atoms parsed from the search query;

for a first document, identifying, in a first entry of a forward index, document tokens in a token stream corresponding to the first document that match the query tokens in the token map;

for the document tokens that match the query tokens based on the one or more atoms, updating a token position data structure, wherein the token position data structure includes token positions in the token stream corresponding to the first document of each of the document tokens that match the query tokens, wherein the token position data structure stores the token positions in the token stream in association with the one or more atoms in the search query;

accessing the updated token position data structure to extract ranking information from the first entry of the forward index, wherein the ranking information is extracted from the first entry of the forward index via the updated token position data structure based on the token positions in the token stream; and

executing ranking calculations for documents associated with the search query based on the ranking information extracted from the forward index via the updated token position data structure.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and computer storage media are provided for generating entries for documents in a forward index. A document and its document identification are received, in addition to static features that are query-independent. The document is parsed into tokens to form a token stream corresponding to the document. Relevant data used to calculate rankings of document is identified and a position of the data is determined. The entry is then generated from the document identification, the token stream of the document, the static features, and the positional information of the relevant data. The entry is stored in the forward index.

105 Citations

20 Claims

1. A computer-implemented method for using a forward index to extract information for ranking documents based on a search query, the method comprising:
- receiving a search query;
  
  parsing the search query to identify one or more atoms;
  
  creating a token map of query tokens using the one or more atoms parsed from the search query;
  
  for a first document, identifying, in a first entry of a forward index, document tokens in a token stream corresponding to the first document that match the query tokens in the token map;
  
  for the document tokens that match the query tokens based on the one or more atoms, updating a token position data structure, wherein the token position data structure includes token positions in the token stream corresponding to the first document of each of the document tokens that match the query tokens, wherein the token position data structure stores the token positions in the token stream in association with the one or more atoms in the search query;
  
  accessing the updated token position data structure to extract ranking information from the first entry of the forward index, wherein the ranking information is extracted from the first entry of the forward index via the updated token position data structure based on the token positions in the token stream; and
  
  executing ranking calculations for documents associated with the search query based on the ranking information extracted from the forward index via the updated token position data structure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising tagging each of the one or more atoms with a context stream as a preferred context stream.
  - 3. The method of claim 2, wherein the context stream is one or more of a title, anchor, header, body, traffic, class, attributes, and uniform resource locator (URL).
  - 4. The method of claim 1, wherein the forward index is indexed by document.
  - 5. The method of claim 1, wherein the ranking information includes the position in the token stream of the first document corresponding to the document tokens that match the one or more query tokens in the token map.
  - 6. The method of claim 1, further comprising:
    - for a second document, identifying, in a second entry of the forward index, the document tokens in the token stream corresponding to the second document that match the query tokens in the token map;
      
      for the document tokens that match the query tokens, updating the token position data structure with the position in the token stream corresponding to the second document of each of the document tokens; and
      
      utilizing the updating token position data structure, extracting ranking information for ranking calculations from the second entry of the forward index.
  - 7. The method of claim 1, wherein prior to identifying the document tokens in the first document, the first document was preliminarily found to be relevant to the search query.
  - 8. The method of claim 1, wherein the first entry is associated with the first document.
  - 9. The method of claim 1, further comprising receiving a plurality of document identifications associated with documents that have previously been determined to be relevant to the received search query, wherein the previous relevancy of the plurality of documents is determined by way of a reverse index that is indexed by atom.

10. One or more hardware computer-storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform a method for using a forward index to extract information for ranking documents based on a search query, the method comprising:
- receiving a search query;
  
  parsing the search query to identify one or more atoms;
  
  creating a token map of query tokens using the one or more atoms parsed from the search query;
  
  for a first document, identifying, in a first entry of a forward index, document tokens in a token stream corresponding to the first document that match the query tokens in the token map;
  
  for the document tokens that match the query tokens based on the one or more atoms, updating a token position data structure, wherein the token position data structure includes token positions in the token stream corresponding to the first document of each of the document tokens that match the query tokens, wherein the token position data structure stores the token positions in the token stream in association with the one or more atoms in the search query;
  
  accessing the updated token position data structure to extract ranking information from the first entry of the forward index, wherein the ranking information is extracted from the first entry of the forward index via the updated token position data structure based on the token positions in the token stream; and
  
  executing ranking calculations for documents associated with the search query based on the ranking information extracted from the forward index via the updated token position data structure.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The media of claim 10, further comprising tagging each of the one or more atoms with a context stream as a preferred context stream.
  - 12. The media of claim 11, wherein the context stream is one or more of a title, anchor, header, body, traffic, class, attributes, and uniform resource locator (URL).
  - 13. The media of claim 10, wherein the forward index is indexed by document.
  - 14. The media of claim 10, wherein the ranking information includes the position in the token stream of the first document corresponding to the document tokens that match the one or more query tokens in the token map.
  - 15. The media of claim 10, wherein prior to identifying the document tokens in the first document, the first document was preliminarily found to be relevant to the search query.
  - 16. The media of claim 10, wherein the first entry is associated with the first document.
  - 17. The media of claim 10, further comprising receiving a plurality of document identifications associated with documents that have previously been determined to be relevant to the received search query, wherein the previous relevancy of the plurality of documents is determined by way of a reverse index that is indexed by atom.

18. A system for using a forward index to extract information for ranking documents based on a search query, the system comprising:
- an index generator having one or more hardware processors and one or more hardware computer-storage media; and
  
  a forward index coupled with the index generator, wherein the index generator is configured for;
  
  receiving a search query;
  
  parsing the search query to identify one or more atoms;
  
  creating a token map of query tokens using the one or more atoms parsed from the search query;
  
  for a first document, identifying, in a first entry of a forward index, document tokens in a token stream corresponding to the first document that match the query tokens in the token map;
  
  for the document tokens that match the query tokens based on the one or more atoms, updating a token position data structure wherein the token position data structure includes token positions in the token stream corresponding to the first document of each of the document tokens that match the query tokens, wherein the token position data structure stores the token positions in the token stream in association with the one or more atoms in the search query;
  
  accessing the updated token position data structure to extract ranking information from the first entry of the forward index, wherein the ranking information is extracted from the first entry of the forward index via the updated token position data structure based on the token positions in the token stream; and
  
  executing ranking calculations for documents associated with the search query based on the ranking information extracted from the forward index via the updated token position data structure.
- View Dependent Claims (19, 20)
- - 19. The system of claim 18, wherein the forward index further comprises:
    - for the first document, a compressed token stream, wherein the compressed token stream is a compressed version of a token stream of the document.
  - 20. The system of claim 18, wherein the forward index further comprises:
    - for the first document,a document identification;
      
      a compressed separate stream for context of the first document;
      
      one or more static features associated with the document, wherein the one or more static features are unrelated to the search query; and
      
      positional information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Risvik, Knut Magne, Hopcroft, Michael, Bennett, John G., Kalyanaraman, Karthik, Chilimbi, Trishul, Walters, Chad P., Parikh, Vishesh, Pedersen, Jan Otto
Primary Examiner(s)
Kim, Paul

Application Number

US14/325,871
Publication Number

US 20140324819A1
Time in Patent Office

1,918 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/182   Distributed file systems

G06F 16/2453   Query optimisation

G06F 16/24578   using ranking

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9538   Presentation of query results

Efficient forward ranking in a search engine

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

105 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Efficient forward ranking in a search engine

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

105 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others