MODEL-BASED IDENTIFICATION OF RELEVANT CONTENT

US 20170075978A1
Filed: 09/16/2015
Published: 03/16/2017
Est. Priority Date: 09/16/2015
Status: Abandoned Application

First Claim

Patent Images

1. A method, comprising:

obtaining validated training data comprising a first set of content items and a first set of relevance tags, wherein the first set of relevance tags is used by one or more domain experts to identify the first set of content items as relevant to one or more topics;

using the validated training data to produce, by one or more computer systems, a statistical model for classifying a relevance of content to the one or more topics;

using the statistical model to generate, by the one or more computer systems, a second set of relevance tags for a second set of content items; and

outputting, by the one or more computer systems, one or more groupings of the second set of content items by the second set of relevance tags to improve understanding of content related to the one or more topics without requiring a user to manually analyze the second set of content items.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed embodiments provide a system for processing data. During operation, the system obtains validated training data containing a first set of content items and a first set of relevance tags, wherein the first set of relevance tags is used by one or more domain experts to identify the first set of content items as relevant to one or more topics. Next, the system uses the validated training data to produce a statistical model for classifying a relevance of content to the one or more topics. The system then uses the statistical model to generate a second set of relevance tags for a second set of content items. Finally, the system outputs one or more groupings of the second set of content items by the second set of relevance tags to improve understanding of content related to the one or more topics without requiring a user to manually analyze the second set of content items.

Citations

20 Claims

1. A method, comprising:
- obtaining validated training data comprising a first set of content items and a first set of relevance tags, wherein the first set of relevance tags is used by one or more domain experts to identify the first set of content items as relevant to one or more topics;
  
  using the validated training data to produce, by one or more computer systems, a statistical model for classifying a relevance of content to the one or more topics;
  
  using the statistical model to generate, by the one or more computer systems, a second set of relevance tags for a second set of content items; and
  
  outputting, by the one or more computer systems, one or more groupings of the second set of content items by the second set of relevance tags to improve understanding of content related to the one or more topics without requiring a user to manually analyze the second set of content items.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising:
    - obtaining a validated subset of the second set of relevance tags for the second set of content items.
  - 3. The method of claim 2, further comprising:
    - providing the validated subset as additional training data to the statistical model to produce an update to the statistical model; and
      
      using the update to generate a third set of relevance tags for a third set of content items.
  - 4. The method of claim 1, wherein using the training data to produce the statistical model for classifying the relevance of content to the one or more topics comprises:
    - generating a set of features from a content item in the first set of content items; and
      
      providing the set of features as input to the statistical model.
  - 5. The method of claim 4, wherein the set of features comprises one or more n-grams from the content item.
  - 6. The method of claim 4, wherein the set of features comprises at least one of:
    - a number of characters;
      
      a number of capitalized characters; and
      
      a number of special characters.
  - 7. The method of claim 4, wherein the set of features comprises at least one of:
    - a number of proper nouns;
      
      a number of emoticons;
      
      a number of words; and
      
      a number of sentences.
  - 8. The method of claim 4, wherein the set of features comprises at least one of:
    - an average number of words in a sentence;
      
      a percentage of special characters;
      
      a percentage of emoticon characters; and
      
      a number of Uniform Resource Locators.
  - 9. The method of claim 4, wherein the set of features comprises a topic related to social media.
  - 10. The method of claim 1, wherein the one or more topics comprise a product associated with an online professional network.

11. An apparatus, comprising:
- one or more processors; and
  
  memory storing instructions that, when executed by the one or more processors, cause the apparatus to;
  
  obtain validated training data comprising a first set of content items and a first set of relevance tags, wherein the first set of relevance tags is used by one or more domain experts to identify the first set of content items as relevant to one or more topics;
  
  use the validated training data to produce a statistical model for classifying a relevance of content to the one or more topics;
  
  use the statistical model to generate a second set of relevance tags for a second set of content items; and
  
  output one or more groupings of the second set of content items by the second set of relevance tags to improve understanding of content related to the one or more topics without requiring a user to manually analyze the second set of content items.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The apparatus of claim 11, wherein the memory further stores instructions that, when executed by the one or more processors, cause the apparatus to:
    - obtain a validated subset of the second set of relevance tags for the first set of content items;
      
      provide the validated subset as additional training data to the statistical model to produce an update to the statistical model; and
      
      use the update to generate a third set of relevance tags for a third set of content items.
  - 13. The apparatus of claim 11, wherein using the training data to produce the statistical model for classifying the relevance of content to the one or more topics comprises:
    - generating a set of features from a content item in the first set of content items; and
      
      providing the set of features as input to the statistical model.
  - 14. The apparatus of claim 13, wherein the set of features comprises at least one of:
    - a number of characters;
      
      a number of capitalized characters; and
      
      a number of special characters.
  - 15. The apparatus of claim 13, wherein the set of features comprises at least one of:
    - a number of proper nouns;
      
      a number of emoticons;
      
      a number of words; and
      
      a number of sentences.
  - 16. The apparatus of claim 13, wherein the set of features comprises at least one of:
    - an average number of words in a sentence;
      
      a percentage of special characters;
      
      a percentage of emoticon characters; and
      
      a number of Uniform Resource Locators.
  - 17. The apparatus of claim 13, wherein the set of features comprises a topic related to social media.
  - 18. The apparatus of claim 13, wherein the one or more topics comprise a product associated with an online professional network.

19. A system, comprising:
- an analysis non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the system to;
  
  obtain validated training data comprising a first set of content items and a first set of relevance tags, wherein the first set of relevance tags is used by one or more domain experts to identify the first set of content items as relevant to one or more topics;
  
  use the validated training data to produce a statistical model for classifying a relevance of content to the one or more topics; and
  
  use the statistical model to generate a second set of relevance tags for a second set of content items; and
  
  a management non-transitory computer-readable medium comprising instructions that, when executed by the one or more processors, cause the system to output one or more groupings of the second set of content items by the second set of relevance tags to improve understanding of content related to the one or more topics without requiring a user to manually analyze the second set of content items.
- View Dependent Claims (20)
- - 20. The system of claim 19, wherein the analysis non-transitory computer-readable medium further instructions that, when executed by the one or more processors, cause the system to:
    - obtain a validated subset of the second set of relevance tags for the first set of content items;
      
      provide the validated subset as additional training data to the statistical model to produce an update to the statistical model; and
      
      use the update to generate a third set of relevance tags for a third set of content items.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
LinkedIn Corporation (Microsoft Corporation)
Inventors
Zhang, Yongzheng, Kuan, Chi-Yi, Zheng, Yi

Application Number

US14/856,306
Publication Number

US 20170075978A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/353   into predefined classes

G06F 17/18   for evaluating statistical ...

G06F 40/216   using statistical methods

G06F 40/30   Semantic analysis

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

MODEL-BASED IDENTIFICATION OF RELEVANT CONTENT

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

MODEL-BASED IDENTIFICATION OF RELEVANT CONTENT

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links