Semantic analysis of documents to rank terms

US 8,504,564 B2
Filed: 12/15/2010
Issued: 08/06/2013
Est. Priority Date: 03/27/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

extracting, by a processor, text from a document;

identifying, by the processor, terms within the extracted text, each of the terms comprising a contiguous grouping of two or more tokens, each token comprising a word;

determining, by the processor, a token value representing a total number of times each token occurs in the document;

determining, by the processor, a token frequency for each of the terms as a function of the token values of tokens in each of the terms; and

ranking, by the processor, the terms using the token frequency determined for each of the terms.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, apparatus and computer program product provides for a semantic analyzer to produce and rank semantic terms to reflect their relationship to the theme and topics of a document. The text and the document can have no relationship to any pre-selected keywords before the semantic analyzer performs text extraction. The semantic analyzer extracts text from a document and performs semantic analysis on the extracted text. The semantic analyzer provides a plurality of ranked semantic terms as a result of the semantic analysis and associates semantic terms with the document as semantic keywords. The semantic terms define content to be presented with the document where the content is an advertisement, a link to a remote information resource or a second document.

60 Citations

View as Search Results

19 Claims

1. A computer-implemented method comprising:
- extracting, by a processor, text from a document;
  
  identifying, by the processor, terms within the extracted text, each of the terms comprising a contiguous grouping of two or more tokens, each token comprising a word;
  
  determining, by the processor, a token value representing a total number of times each token occurs in the document;
  
  determining, by the processor, a token frequency for each of the terms as a function of the token values of tokens in each of the terms; and
  
  ranking, by the processor, the terms using the token frequency determined for each of the terms.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1 wherein the function determines a mean of the token values of tokens in each of the terms.
  - 3. The method of claim 1 further comprising determining a term value for each of the terms representing a total number of times each of the terms occurs in document, wherein ranking the terms further comprises using the term value for each of the terms.
  - 4. The method of claim 1 further comprising determining a standard deviation of offset for each of the terms using positions of individual occurrences of each of the terms in the document, wherein ranking the terms further comprises using the standard deviation of offset for each of the terms.
  - 5. The method of claim 4 wherein determining the standard deviation of offset for each of the terms comprises determining a mean offset for each of the terms.
  - 6. The method of claim 1 further comprising determining a standard deviation of gap for each of the terms using differences between positions of individual occurrences of each the terms, wherein ranking the terms further comprises using the standard deviation of gap for each of the terms.
  - 7. The method of claim 6 wherein determining the standard deviation of gap for each of the terms comprises determining a mean gap for each of the terms.
  - 8. The method of claim 1 further comprising:
    - providing a listing of ranked terms;
      
      receiving selections of one or more terms from the listing to create a content preview of additional content identified using the selections of one or more terms; and
      
      receiving designations of terms from the listing to be associated with the document as one or more keywords.
  - 9. The method of claim 1 further comprising:
    - providing a listing of ranked terms;
      
      receiving designations of terms from the listing to be associated with the document as one or more keywords; and
      
      associating the one or more keywords with the document.
  - 10. The method of claim 9 further comprising inserting the one or more keywords into a metadata portion of the document.
  - 11. The method of claim 1 wherein the function determines an average of the token values of tokens in each of the terms.

12. A computer-implemented method comprising:
- extracting, by a processor, text from a document;
  
  identifying, by the processor, terms within the extracted text, each of the terms comprising a contiguous grouping of two or more tokens, each token comprising a word;
  
  determining, by the processor, a standard deviation of offset or gap for each of the terms using positions of individual occurrences of each of the terms in the document; and
  
  ranking the terms using the standard deviation of offset or gap for each of the terms.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The method of claim 12 wherein determining the standard deviation of offset for each of the terms comprises determining a mean offset for each of the terms.
  - 14. The method of claim 12 wherein determining the standard deviation of gap for each of the terms comprises determining a mean gap for each of the terms.
  - 15. The method of claim 12 further comprising:
    - providing a listing of ranked terms;
      
      receiving selections of one or more terms from the listing to create a content preview of additional content identified using the selections of one or more terms; and
      
      receiving designations of terms from the listing to be associated with the document as one or more keywords.
  - 16. The method of claim 12 further comprising:
    - providing a listing of ranked terms;
      
      receiving designations of terms from the listing to be associated with the document as one or more keywords; and
      
      associating the one or more keywords with the document.
  - 17. The method of claim 16 further comprising inserting the one or more keywords into a metadata portion of the document.

18. A non-transitory computer-readable medium on which is encoded program code, the program code comprising:
- program code for extracting text from a document;
  
  program code for identifying terms within the extracted text, each of the terms comprising a contiguous grouping of two or more tokens, each token comprising a word;
  
  program code for determining a token value representing a total number of times each token occurs in the document;
  
  program code for determining a token frequency for each of the terms as a function of the token values of tokens in each of the terms; and
  
  program code for ranking the terms using the token frequency determined for each of the terms.

19. A non-transitory computer-readable medium on which is encoded program code, the program code comprising:
- program code for extracting text from a document;
  
  program code for identifying terms within the extracted text, each of the termscomprising a contiguous grouping of two or more tokens, each token comprising a word;
  
  program code for determining a standard deviation of offset or gap for each of the terms using positions of individual occurrences of each of the terms in the document; and
  
  program code for ranking the terms using the standard deviation of offset or gap for each of the terms.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Chang, Walter, Ghamrawi, Nadia
Primary Examiner(s)
THAI, HANH B

Application Number

US12/968,326
Publication Number

US 20110082863A1
Time in Patent Office

965 Days
Field of Search

707730-731, 707738-739, 707/749, 707/750
US Class Current

707/730
CPC Class Codes

G06F 40/30   Semantic analysis

G06Q 30/02   Marketing; Price estimation...

G06Q 30/0207   Discounts or incentives, e....

Semantic analysis of documents to rank terms

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

60 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Semantic analysis of documents to rank terms

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

60 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links