Clustering web pages on a search engine results page

US 9,026,519 B2
Filed: 08/09/2011
Issued: 05/05/2015
Est. Priority Date: 08/09/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of delivering search results of one or more events using a computing device having processor, memory, and data storage subsystems, the computer-implemented method comprising:

providing a plurality of documents, wherein the plurality of documents includes fresh documents and non-fresh documents, wherein fresh documents have life spans falling within a predetermined period of time, and wherein non-fresh documents have life spans exceeding the predetermined period of time;

grouping the plurality of documents based on page content similarity to form one or more clusters;

assigning an identification (ID) number and one or more respective related attributes to each of the one or more clusters;

maintaining the assigned ID numbers and the respective related attributes for each of the one or more clusters after the plurality of documents are no longer considered to be fresh documents; and

subdividing each of the one or more clusters into one or more subdivided clusters according to publication date.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and media are provided for delivering clustered search results for recent and non-recent events by maintaining the identification (ID) numbers of the respective clustered documents beyond the “fresh” life span of the clustered documents. When clusters are formed according to similar content, an ID number and associated attributes are assigned to each of the clusters. This provides a mechanism to track and retrieve the respective clusters for subsequent delivery of search results. The respective ID numbers of the clusters are maintained, even after the documents are no longer considered “fresh.” These similar-content clusters are further subdivided according to publication date. This provides individual subdivided clusters for similar content events that occurred at different time spans, which are delivered along with individual non-clustered search results in a SERP.

28 Citations

View as Search Results

19 Claims

1. A computer-implemented method of delivering search results of one or more events using a computing device having processor, memory, and data storage subsystems, the computer-implemented method comprising:
- providing a plurality of documents, wherein the plurality of documents includes fresh documents and non-fresh documents, wherein fresh documents have life spans falling within a predetermined period of time, and wherein non-fresh documents have life spans exceeding the predetermined period of time;
  
  grouping the plurality of documents based on page content similarity to form one or more clusters;
  
  assigning an identification (ID) number and one or more respective related attributes to each of the one or more clusters;
  
  maintaining the assigned ID numbers and the respective related attributes for each of the one or more clusters after the plurality of documents are no longer considered to be fresh documents; and
  
  subdividing each of the one or more clusters into one or more subdivided clusters according to publication date.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, wherein grouping a plurality of documents comprises grouping a plurality of fresh documents.
  - 3. The computer-implemented method of claim 1, wherein grouping a plurality of documents comprises grouping a plurality of non-recent event documents.
  - 4. The computer-implemented method of claim 1, wherein the assigned ID numbers remain persistent throughout a lifetime of each respective document'"'"'s life.
  - 5. The computer-implemented method of claim 1, wherein each of the plurality of documents are considered to be a fresh document for approximately a one-month life span.
  - 6. The computer-implemented method of claim 1, further comprising:
    - displaying the one or more subdivided clusters by publication date for one of the one or more clusters to a user interface of the computing device in response to a user search query.
  - 7. The computer-implemented method of claim 6, wherein displaying each of the one or more subdivided clusters comprises displaying a respective one or more of:
    - a dominant title, a dominant image, or a dominant news summary.
  - 8. The computer-implemented method of claim 1, wherein the one or more subdivided clusters comprise grouped Uniform Resource Locators (URLs) according to respective ID numbers of the one or more subdivided clusters.

9. One or more computer-readable storage media storing computer-readable instructions embodied thereon that, when executed by a computing device, perform a method of delivering persistent clusters in a search engine results page, the method comprising:
- retrieving documents from a database according to a received search query;
  
  clustering some of the retrieved documents into one or more clusters based on content similarity and publication date;
  
  assigning an identification (ID) number to each of the clusters of the retrieved documents, wherein the ID number of each of the clusters remains persistent throughout a life span of each of the clustered retrieved documents; and
  
  delivering each of the clusters with other individual non-clustered results in the search engine results page to a user interface in response to the received search query.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The one or more computer-readable storage media of claim 9, wherein some of the one or more clusters comprise retrieved documents that are fresh documents.
  - 11. The one or more computer-readable storage media of claim 9, wherein some of the one or more clusters comprise retrieved documents that are not fresh documents.
  - 12. The one or more computer-readable storage media of claim 9, wherein the one or more clusters comprise one or more grouped Uniform Resource Locators (URLs).
  - 13. The one or more computer-readable storage media of claim 9, further comprising:
    - providing a thumbnail synopsis for each of the one or more clusters.
  - 14. The one or more computer-readable storage media of claim 13, wherein the thumbnail synopsis comprises one or more of:
    - a number of documents, a host domain, or one or more dominant features for each of the one or more clusters.

15. One or more computer-readable storage media storing computer-readable instructions embodied thereon that, when executed by a computing device, perform a method of providing clustered non-unique results in a search engine results page, the method comprising:
- providing a plurality of documents comprising fresh documents that have life spans falling within a predetermined period of time;
  
  grouping the fresh documents based on page content similarity to form one or more clusters;
  
  assigning an identification (ID) number and one or more respective related attributes to each of the one or more clusters;
  
  maintaining the assigned ID numbers and the respective related attributes for each of the one or more clusters after the clustered documents are no longer considered to be fresh documents, wherein the clustered documents are no longer considered to be fresh documents when their life spans exceed the predetermined period of time;
  
  retrieving a set of documents from the plurality of documents in response to a received user search query, wherein the set of documents includesA) fresh documents having life spans falling within a predetermined period of time, andB) non-fresh documents having life spans exceeding the predetermined period of time,wherein each document in the retrieved set of documents is associated with one or more of the ID numbers assigned to the clusters, regardless of whether the document is a fresh document or a non-fresh document;
  
  selecting a set number of top results from the retrieved set of documents;
  
  grouping the top results according to publication date or content similarity using one or more of the ID numbers of one or more respective retrieved clusters; and
  
  delivering search results to a user interface in response to the received user search query, the search engine results page comprising the grouped top results.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The one or more computer-readable storage media of claim 15, wherein the one or more ID numbers persist throughout a document life span for the associated one or more retrieved clusters.
  - 17. The one or more computer-readable storage media of claim 15, wherein the search engine results page comprises clustered results and non-clustered results.
  - 18. The one or more computer-readable storage media of claim 17, wherein the clustered results comprise newly formed clustered results.
  - 19. The one or more computer-readable storage media of claim 15, wherein the grouping is executed via a clustering algorithm.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Ahmed, Junaid, Saraf, Yatharth, Sun, Walter, Parthasarathy, Sasi Kumar
Primary Examiner(s)
THAI, HANH B

Application Number

US13/205,809
Publication Number

US 20130041877A1
Time in Patent Office

1,365 Days
Field of Search

707/709, 707726-729, 707/737, 707/749
US Class Current

707/709
CPC Class Codes

G06F 16/2365   Ensuring data consistency a...

G06F 16/285   Clustering or classification

G06F 16/355   Class or cluster creation o...

G06F 16/9038   Presentation of query results

G06F 16/907   Retrieval characterised by ...

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

Clustering web pages on a search engine results page

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

28 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Clustering web pages on a search engine results page

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

28 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links