Method and apparatus for computerized extracting of scheduling information from a natural language e-mail

US 7,158,980 B2
Filed: 10/02/2003
Issued: 01/02/2007
Est. Priority Date: 10/02/2003
Status: Active Grant

First Claim

Patent Images

1. A method for computerized extracting of scheduling information from a natural language text for automatic entry into a calendar application, the method comprising the following steps:

(a) parsing the natural language text to build a dependency tree by segmenting each sentence in the natural language text into words, building the dependency tree containing dependency pairs by comparing word pairs in the natural language text with a dependency database, and adding the word pairs found in the dependency database as dependency pairs to the dependency tree;

(b) determining if the natural language text contains scheduling information by calculating a probability sum for the dependency tree; and

(c) if the probability sum exceeds a predetermined value, extracting scheduling information from the dependency tree and exporting the scheduling information to the calendar application;

wherein building the dependency database includes the following steps;

segmenting each sentence in a text corpus into words, wherein the text corpus contains a plurality of sample natural language texts containing scheduling information;

for each sentence in the text corpus, checking all possible combinations of word pairs to determine if the word pair has a high co-occurrency in the text corpus;

if the word pair has the high co-occurrency in the text corpus, determining a head word using a tagged corpus, and checking the validity of the word pair using violation constraints, wherein the tagged corpus specifies actual head words for sentences relevant to scheduling information in the text corpus and contains dependencies for all other words with respect to the actual head words, and the violation constraints specify illegal dependency structures;

if the word pair is a valid dependency pair, computing a probability of the word pair, adding the word pair as a dependency pair to the dependency database, and adding the probability of the dependency pair to the dependency database, wherein the probability of the dependency pair corresponds to a frequency of the word pair in the text corpus; and

repeating the above steps until no new dependency pairs are identified.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A processor is connected to a storage device storing an incoming e-mail, a dependency database, and code for a calendar application. The dependency database can be built from an e-mail corpus containing a plurality of natural language e-mails containing scheduling information. The incoming e-mail is parsed by the processor to build a dependency tree containing word pairs from the e-mail that are found in the dependency tree. The word pairs are stored as dependency pairs in a tree structure in the dependency tree. A probability sum for the dependency tree is calculated to determine if the e-mail contains scheduling information. If the probability sum exceeds a predetermined value, the e-mail is assumed to contain scheduling information and the scheduling information is extracted from the dependency tree and exported to the calendar application.

125 Citations

16 Claims

1. A method for computerized extracting of scheduling information from a natural language text for automatic entry into a calendar application, the method comprising the following steps:
- (a) parsing the natural language text to build a dependency tree by segmenting each sentence in the natural language text into words, building the dependency tree containing dependency pairs by comparing word pairs in the natural language text with a dependency database, and adding the word pairs found in the dependency database as dependency pairs to the dependency tree;
  
  (b) determining if the natural language text contains scheduling information by calculating a probability sum for the dependency tree; and
  
  (c) if the probability sum exceeds a predetermined value, extracting scheduling information from the dependency tree and exporting the scheduling information to the calendar application;
  
  wherein building the dependency database includes the following steps;
  
  segmenting each sentence in a text corpus into words, wherein the text corpus contains a plurality of sample natural language texts containing scheduling information;
  
  for each sentence in the text corpus, checking all possible combinations of word pairs to determine if the word pair has a high co-occurrency in the text corpus;
  
  if the word pair has the high co-occurrency in the text corpus, determining a head word using a tagged corpus, and checking the validity of the word pair using violation constraints, wherein the tagged corpus specifies actual head words for sentences relevant to scheduling information in the text corpus and contains dependencies for all other words with respect to the actual head words, and the violation constraints specify illegal dependency structures;
  
  if the word pair is a valid dependency pair, computing a probability of the word pair, adding the word pair as a dependency pair to the dependency database, and adding the probability of the dependency pair to the dependency database, wherein the probability of the dependency pair corresponds to a frequency of the word pair in the text corpus; and
  
  repeating the above steps until no new dependency pairs are identified.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein when building the dependency tree:
    - for each sentence in the natural language text, forming a head word list of all possible head words in the sentence; and
      
      pairing each word in each sentence in the natural language text with the possible head words in the head word list to form the word pair, wherein if the word pair formed by the word and a possible head word is found in the dependency database, adding the word pair formed by the word and the possible head word as the dependency pair to the dependency tree.
  - 3. The method of claim 1, wherein determining if the natural language text contains scheduling information further comprises calculating the probability sum for the natural language text by adding up probabilities for all the dependency pairs in the dependency tree, the probability of each dependency pair corresponding to the frequency of the dependency pair in the text corpus, the text corpus containing a plurality of sample natural language texts containing scheduling information.
  - 4. The method of claim 1, wherein after extracting scheduling information from the natural language text, the method further comprising computing a value for the scheduling information.
  - 5. The method of claim 1, wherein after extracting scheduling information from the natural language text, the method further comprising sending a confirmation message to a user to confirm the scheduling information.
  - 6. The method of claim 1, wherein exporting the extracted scheduling information to the calendar application further comprises sending a confirmation message to the calendar application.
  - 7. The method of claim 1, wherein the natural language text is a natural language e-mail.

8. A personal organization apparatus comprising:
- a processor for executing code in the personal organization apparatus; and
  
  a storage unit connected to the processor for storing data used by the processor including a natural language text, the storage unit including a dependency database, the dependency database specifying a plurality of dependency pairs and a corresponding probability of each dependency pair, each dependency pair being a word pair found in a text corpus, the probability of the dependency pair corresponding to a frequency of the word pair in the text corpus, and the text corpus including a plurality of sample natural language texts containing scheduling information;
  
  wherein the processor parses the natural language text to build a dependency tree in the storage unit, determines if the natural language text contains scheduling information by calculating a probability sum for the dependency tree, and if the probability sum exceeds a predetermined value, extracts scheduling information from the dependency tree and exports the scheduling information to a calendar application;
  
  the processor also builds the dependency tree in the storage unit containing dependency pairs by comparing word pairs in the natural language text with the dependency database and adding the word pairs found in the dependency database as dependency pairs to the dependency tree, calculates the probability sum for the natural language text by adding up probabilities for all the dependency pairs in the dependency tree, and if the probability sum exceeds a predetermined sum, extracts scheduling information from the dependency tree and exports the scheduling information to the calendar application; and
  
  the processor further builds the dependency database using the text corpus, wherein for each sentence in the text corpus, the processor checks all possible combinations of the word pairs to determine if the word pair has a high co-occurrency in the text corpus;
  
  if the word pair has the high co-occurrency in the text corpus, the processor determines a head word using a tagged corpus, and checks the validity of the word pair using violation constraints, wherein the tagged corpus specifies actual head words for sentences relevant to scheduling information in the text corpus and contains dependencies for all other words with respect to the actual head words, and the violation constraints specify illegal dependency structures; and
  
  if the word pair is a valid dependency pair, the processor determines the frequency of the word pair in the text corpus and adds the word pair as the dependency pair to the dependency database and adds the frequency of the word pair as the probability of the dependency pair to the dependency database.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The personal organization apparatus of claim 8, wherein when the processor builds the dependency tree:
    - for each sentence in the natural language text, the processor forms a head word list of all possible head words in the sentence, the head word list being stored in the storage unit; and
      
      the processor pairs each word in each sentence in the natural language text with the possible head words in the head word list, wherein if the word pair formed by the word and the possible head word is found in the dependency database, the processor adds the word pair formed by the word and the possible head word as the dependency pair to the dependency tree.
  - 10. The personal organization apparatus of claim 8, wherein the processor repetitively builds the dependency database until no new dependency pairs are identified.
  - 11. The personal organization apparatus of claim 8, wherein when building the dependency database the processor further segments each sentence in the text corpus into words.
  - 12. The personal organization apparatus of claim 8, wherein the processor further segments each sentence in the natural language text into words.
  - 13. The personal organization apparatus of claim 8, wherein after extracting scheduling information from the natural language text, the processor computes a value for the scheduling information.
  - 14. The personal organization apparatus of claim 8, wherein after extracting scheduling information from the natural language text, the processor sends a confirmation message to a user interface module to confirm the scheduling information.
  - 15. The personal organization apparatus of claim 8, wherein when the processor exports the scheduling information to the calendar application, the processor further sends a confirmation message to the calendar application.
  - 16. The personal organization apparatus of claim 8, wherein the natural language text is a natural language e-mail.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Acer, Inc.
Original Assignee
Acer, Inc.
Inventors
Shen, Cheng-Chung
Primary Examiner(s)
Alam; Hosain
Assistant Examiner(s)
Saeed; Usmaan

Application Number

US10/605,500
Publication Number

US 20050076037A1
Time in Patent Office

1,188 Days
Field of Search

705/1, 705/9, 707/1, 707/3, 707/100, 707/102, 704/1, 704/9
US Class Current

1/1
CPC Class Codes

G06Q 10/107 Computer-aided management o...

Y10S 707/99943 Generating database or data...

Method and apparatus for computerized extracting of scheduling information from a natural language e-mail

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

125 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for computerized extracting of scheduling information from a natural language e-mail

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

125 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links