×

Information category obtaining method and apparatus

  • US 10,346,496 B2
  • Filed: 10/27/2016
  • Issued: 07/09/2019
  • Est. Priority Date: 06/06/2014
  • Status: Active Grant
First Claim
Patent Images

1. An information category acquiring method, the method comprising:

  • acquiring, by a computing device, a browse record about a user browsing a Web page, the browse record comprising at least a Web page identifier of the Web page that the user browses;

    acquiring, by the computing device, a first feature word set corresponding to the Web page according to the Web page identifier of the Web page that the user browses, the first feature word set corresponding to the Web page being used to store a feature word comprised in the Web page;

    acquiring, by the computing device, an information category to which the Web page belongs according to the first feature word set corresponding to the Web page and a correspondence between an information category and a second feature word set;

    counting, by the computing device, Web page quantities comprised in information categories;

    separately determining, by the computing device, the Web page quantities comprised in the information categories as interestingness of the user for the information categories; and

    acquiring, by the computing device, an information category for which interestingness meets a preset condition, and using the acquired information category as an information category in which the user is interested;

    wherein the acquiring the first feature word set corresponding to the Web page according to the Web page identifier of the Web page that the user browses comprises;

    acquiring Web page content comprised in the Web page according to the Web page identifier of the Web page that the user browses;

    performing word segmentation on the Web page content, to obtain word segments comprised in the Web page content; and

    removing a word segment that meets a first preset part of speech from the word segments comprised in the Web page content, and using a remaining word segment as the feature word comprised in the Web page, to form the first feature word set corresponding to the Web page, wherein the first preset part of speech comprises a modal particle, a stop word, and a near-synonym;

    wherein the acquiring an information category to which the Web page belongs according to the first feature word set corresponding to the Web page and a correspondence between an information category and a second feature word set comprises;

    calculating a probability in each second feature word set in the correspondence, of each feature word included in the first feature word set;

    using a product of a non-zero probability in the each second feature word set, of the each feature word included in the first feature word set, as a matching degree between the first feature word set corresponding to the Web page and the each second feature word set;

    selecting a second feature word set whose matching degree with the first feature word set is the maximum; and

    determining an information category corresponding to the selected second feature word set as the information category to which the Web page belongs.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×