Identifying Content Relationship for Content Copied by a Content Identification Mechanism

US 20150324350A1
Filed: 05/12/2014
Published: 11/12/2015
Est. Priority Date: 05/12/2014
Status: Active Grant

First Claim

Patent Images

1. A method, in a data processing system comprising a processor and a memory configured to implement a natural language processing (NLP) system, for identifying content relationship for content copied by a content identification mechanism, the method comprising:

identifying, by the content identification mechanism, the content from a website using natural language processing;

identifying, by the content identification mechanism, relationship content information associated with a current web page where the content is found on the website;

modifying, by the content identification mechanism, a file structure associated with the content with the relationship content information;

identifying, by the content identification mechanism, one or more classification identifiers in order to classify the content; and

transmitting, by the content identification mechanism, the content and the file structure to a specific corpus in the NLP system based on the one or more classification identifiers.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A mechanism is provided, in a data processing system comprising a processor and a memory configured to implement a natural language processing (NLP) system, for identifying content relationship for content copied by a content identification mechanism. The content identification mechanism identifies content from a website and then identifies relationship content information associated with a current web page where the content is found. The content identification mechanism modifies a file structure associated with the content with the relationship content information. The content identification mechanism identifies one or more classification identifiers in order to classify the content. Finally, the content identification mechanism transmits the content and the file structure to a specific corpus based on the one or more classification identifiers.

30 Citations

View as Search Results

20 Claims

1. A method, in a data processing system comprising a processor and a memory configured to implement a natural language processing (NLP) system, for identifying content relationship for content copied by a content identification mechanism, the method comprising:
- identifying, by the content identification mechanism, the content from a website using natural language processing;
  
  identifying, by the content identification mechanism, relationship content information associated with a current web page where the content is found on the website;
  
  modifying, by the content identification mechanism, a file structure associated with the content with the relationship content information;
  
  identifying, by the content identification mechanism, one or more classification identifiers in order to classify the content; and
  
  transmitting, by the content identification mechanism, the content and the file structure to a specific corpus in the NLP system based on the one or more classification identifiers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - identifying, by the content identification mechanism, other content on the website;
      
      identifying, by the content identification mechanism, cross reference information between the content and the other content; and
      
      updating, by the content identification mechanism, the file structure associated with content with the cross reference information.
  - 3. The method of claim 2, wherein the file structure of the other content is updated with content with the cross reference information.
  - 4. The method of claim 2, wherein the cross reference information is identified using at least one of the group consisting of:
    - parsing, structural analysis, hierarchical analysis, or concept extraction.
  - 5. The method of claim 1, wherein modifying the file structure associated with the content with the relationship content information is performed either through generating a new file structure with path information as well as other identified content, augmenting an existing file structure with new information, or updating the existing file structure with a change in the path information or the other identified content.
  - 6. The method of claim 1, wherein the relationship content information is identified from a Uniform Resource Locator (URL) of the web page or an html of the web page and wherein the relationship content information is utilized to determine document information directly identified in the content or associated with the content.
  - 7. The method of claim 1, wherein the one or more classification identifiers are identified from at least one of major headings and other grouping structure.
  - 8. The method of claim 1, wherein each content is selected from a group comprising a document, a video, an audio file, a recording, a picture, an artifact, an entry, or data.

9. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:
- identify content from a website using natural language processing (NLP);
  
  identify relationship content information associated with a current web page where the content is found on the website;
  
  modify a file structure associated with the content with the relationship content information;
  
  identify one or more classification identifiers in order to classify the content; and
  
  transmit the content and the file structure to a specific corpus in a QA system based on the one or more classification identifiers.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The computer program product of claim 9, wherein the computer readable program further causes the computing device to:
    - identify other content on the website;
      
      identify cross reference information between the content and the other content; and
      
      update the file structure associated with content with the cross reference information.
  - 11. The computer program product of claim 10, wherein the file structure of the other content is updated with content with the cross reference information.
  - 12. The computer program product of claim 10, wherein the cross reference information is identified using at least one of the group consisting of:
    - parsing, structural analysis, hierarchical analysis, or concept extraction.
  - 13. The computer program product of claim 9, wherein the computer readable program to modify the file structure associated with the content with the relationship content information further causes the computing device to either:
    - generate a new file structure with path information as well as other identified content,augment an existing file structure with new information, orupdate the existing file structure with a change in the path information or the other identified content.
  - 14. The computer program product of claim 9, wherein the relationship content information is identified from a Uniform Resource Locator (URL) of the web page or an html of the web page and wherein the relationship content information is utilized to determine document information directly identified in the content or associated with the content.

15. An apparatus comprising:
- a processor; and
  
  a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to;
  
  identify content from a website using natural language processing (NLP);
  
  identify relationship content information associated with a current web page where the content is found on the website;
  
  modify a file structure associated with the content with the relationship content information;
  
  identify one or more classification identifiers in order to classify the content; and
  
  transmit the content and the file structure to a specific corpus in a QA system based on the one or more classification identifiers.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The apparatus of claim 15, wherein the instructions further cause the processor to:
    - identify other content on the website;
      
      identify cross reference information between the content and the other content; and
      
      update the file structure associated with content with the cross reference information.
  - 17. The apparatus of claim 16, wherein the file structure of the other content is updated with content with the cross reference information.
  - 18. The apparatus of claim 16, wherein the cross reference information is identified using at least one of the group consisting of:
    - parsing, structural analysis, hierarchical analysis, or concept extraction.
  - 19. The apparatus of claim 15, wherein the path information is identified from a Uniform Resource Locator (URL) of the web page or an html of the web page and wherein the path information is utilized to determine document information directly identified in the content or associated with the content.
  - 20. The apparatus of claim 15, wherein the instructions to modify the file structure associated with the content with the relationship content information further causes the processor to either:
    - generate a new file structure with path information as well as other identified content,augment an existing file structure with new information, orupdate the existing file structure with a change in the path information or the other identified content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Bufe, John P. III

Granted Patent

US 10,642,935 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/3329   Natural language query form...

G06F 16/3344   using natural language anal...

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

Identifying Content Relationship for Content Copied by a Content Identification Mechanism

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

30 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Identifying Content Relationship for Content Copied by a Content Identification Mechanism

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links