Information category obtaining method and apparatus

US 10,346,496 B2
Filed: 10/27/2016
Issued: 07/09/2019
Est. Priority Date: 06/06/2014
Status: Active Grant

First Claim

Patent Images

1. An information category acquiring method, the method comprising:

acquiring, by a computing device, a browse record about a user browsing a Web page, the browse record comprising at least a Web page identifier of the Web page that the user browses;

acquiring, by the computing device, a first feature word set corresponding to the Web page according to the Web page identifier of the Web page that the user browses, the first feature word set corresponding to the Web page being used to store a feature word comprised in the Web page;

acquiring, by the computing device, an information category to which the Web page belongs according to the first feature word set corresponding to the Web page and a correspondence between an information category and a second feature word set;

counting, by the computing device, Web page quantities comprised in information categories;

separately determining, by the computing device, the Web page quantities comprised in the information categories as interestingness of the user for the information categories; and

acquiring, by the computing device, an information category for which interestingness meets a preset condition, and using the acquired information category as an information category in which the user is interested;

wherein the acquiring the first feature word set corresponding to the Web page according to the Web page identifier of the Web page that the user browses comprises;

acquiring Web page content comprised in the Web page according to the Web page identifier of the Web page that the user browses;

performing word segmentation on the Web page content, to obtain word segments comprised in the Web page content; and

removing a word segment that meets a first preset part of speech from the word segments comprised in the Web page content, and using a remaining word segment as the feature word comprised in the Web page, to form the first feature word set corresponding to the Web page, wherein the first preset part of speech comprises a modal particle, a stop word, and a near-synonym;

wherein the acquiring an information category to which the Web page belongs according to the first feature word set corresponding to the Web page and a correspondence between an information category and a second feature word set comprises;

calculating a probability in each second feature word set in the correspondence, of each feature word included in the first feature word set;

using a product of a non-zero probability in the each second feature word set, of the each feature word included in the first feature word set, as a matching degree between the first feature word set corresponding to the Web page and the each second feature word set;

selecting a second feature word set whose matching degree with the first feature word set is the maximum; and

determining an information category corresponding to the selected second feature word set as the information category to which the Web page belongs.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure discloses an information category acquiring method and apparatus. The method includes: acquiring a browse record about a user browsing a Web page, the browse record including at least a Web page identifier of the Web page that the user browses; acquiring interestingness of the user for information categories according to the browse record; and acquiring an information category for which interestingness meets a first preset condition, and using the acquired information category as an information category in which the user is interested.

Citations

7 Claims

1. An information category acquiring method, the method comprising:
- acquiring, by a computing device, a browse record about a user browsing a Web page, the browse record comprising at least a Web page identifier of the Web page that the user browses;
  
  acquiring, by the computing device, a first feature word set corresponding to the Web page according to the Web page identifier of the Web page that the user browses, the first feature word set corresponding to the Web page being used to store a feature word comprised in the Web page;
  
  acquiring, by the computing device, an information category to which the Web page belongs according to the first feature word set corresponding to the Web page and a correspondence between an information category and a second feature word set;
  
  counting, by the computing device, Web page quantities comprised in information categories;
  
  separately determining, by the computing device, the Web page quantities comprised in the information categories as interestingness of the user for the information categories; and
  
  acquiring, by the computing device, an information category for which interestingness meets a preset condition, and using the acquired information category as an information category in which the user is interested;
  
  wherein the acquiring the first feature word set corresponding to the Web page according to the Web page identifier of the Web page that the user browses comprises;
  
  acquiring Web page content comprised in the Web page according to the Web page identifier of the Web page that the user browses;
  
  performing word segmentation on the Web page content, to obtain word segments comprised in the Web page content; and
  
  removing a word segment that meets a first preset part of speech from the word segments comprised in the Web page content, and using a remaining word segment as the feature word comprised in the Web page, to form the first feature word set corresponding to the Web page, wherein the first preset part of speech comprises a modal particle, a stop word, and a near-synonym;
  
  wherein the acquiring an information category to which the Web page belongs according to the first feature word set corresponding to the Web page and a correspondence between an information category and a second feature word set comprises;
  
  calculating a probability in each second feature word set in the correspondence, of each feature word included in the first feature word set;
  
  using a product of a non-zero probability in the each second feature word set, of the each feature word included in the first feature word set, as a matching degree between the first feature word set corresponding to the Web page and the each second feature word set;
  
  selecting a second feature word set whose matching degree with the first feature word set is the maximum; and
  
  determining an information category corresponding to the selected second feature word set as the information category to which the Web page belongs.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1, wherein before the acquiring a first feature word set corresponding to the Web page according to the Web page identifier of the Web page that the user browses, the method further comprises:
    - removing a duplicate Web page identifier comprised in the browse record.
  - 3. The method according to claim 1, wherein the method further comprises:
    - storing a user identifier of the user and the information category in which the user is interested in a correspondence between a user identifier and an information category.
  - 4. The method according to claim 1, wherein the method further comprises:
    - adding a feature word comprised in the first feature word set into the selected second feature word set.

5. An information category acquiring apparatus, the apparatus comprising:
- one or more processors;
  
  memory; and
  
  a plurality of programs stored in the memory and to be executed by the one or more processors to cause the one or more processors to;
  
  acquire a browse record about a user browsing a Web page, the browse record comprising at least a Web page identifier of the Web page that the user browses;
  
  acquire a first feature word set corresponding to the Web page according to the Web page identifier of the Web page that the user browses, wherein the first feature word set is used to store a feature word comprised in the Web page;
  
  acquire an information category to which the Web page belongs according to the first feature word set corresponding to the Web page and a correspondence between an information category and a second feature word set;
  
  count Web page quantities comprised in information categories;
  
  separately determine the Web page quantities comprised in the information categories as the interestingness of the user for the information categories;
  
  acquire an information category for which interestingness meets a preset condition, and use the acquired information category as an information category in which the user is interested;
  
  wherein the plurality of programs is executed by the one or more processors to cause the one or more processors to;
  
  acquire Web page content comprised in the Web page according to the Web page identifier of the Web page that the user browses;
  
  perform word segmentation on the Web page content, to obtain word segments comprised in the Web page content; and
  
  remove a word segment that meets a first preset part of speech from the word segments comprised in the Web page content, and use a remaining word segment as the feature word comprised in the Web page, to form the first feature word set corresponding to the Web page, wherein the first preset part of speech comprises a modal particle, a stop word, and a near-synonym;
  
  wherein the plurality of programs is executed by the one or more processors to cause the one or more processors to;
  
  calculate a probability in each second feature word set in the correspondence, of each feature word included in the first feature word set;
  
  use a product of a non-zero probability in the each second feature word set, of the each feature word included in the first feature word set, as a matching degree between the first feature word set corresponding to the Web page and the each second feature word set;
  
  select a second feature word set whose matching degree with the first feature word set is the maximum; and
  
  determine an information category corresponding to the selected second feature word set as the information category to which the Web page belongs.
- View Dependent Claims (6, 7)
- - 6. The apparatus according to claim 5, wherein the plurality of programs is executed by the one or more processors to cause the one or more processors to:
    - add a feature word comprised in the first feature word set into the selected second feature word set.
  - 7. The apparatus according to claim 5, wherein the plurality of programs is executed by the one or more processors to cause the one or more processors to:
    - store a user identifier of the user and the information category in which the user is interested in a correspondence between a user identifier and an information category.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Original Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Inventors
Peng, Zuojie, Tang, Jianle, Huang, Yu, Zeng, Wei
Primary Examiner(s)
Mackes, Kris E
Assistant Examiner(s)
Bui, Tiffany Thuy

Application Number

US15/335,682
Publication Number

US 20170046447A1
Time in Patent Office

985 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 16/9535   Search customisation based ...

G06F 16/954   Navigation, e.g. using cate...

Information category obtaining method and apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Information category obtaining method and apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links