Method and apparatus for computerized extracting of scheduling information from a natural language e-mail
First Claim
1. A method for computerized extracting of scheduling information from a natural language text for automatic entry into a calendar application, the method comprising the following steps:
- (a) parsing the natural language text to build a dependency tree by segmenting each sentence in the natural language text into words, building the dependency tree containing dependency pairs by comparing word pairs in the natural language text with a dependency database, and adding the word pairs found in the dependency database as dependency pairs to the dependency tree;
(b) determining if the natural language text contains scheduling information by calculating a probability sum for the dependency tree; and
(c) if the probability sum exceeds a predetermined value, extracting scheduling information from the dependency tree and exporting the scheduling information to the calendar application;
wherein building the dependency database includes the following steps;
segmenting each sentence in a text corpus into words, wherein the text corpus contains a plurality of sample natural language texts containing scheduling information;
for each sentence in the text corpus, checking all possible combinations of word pairs to determine if the word pair has a high co-occurrency in the text corpus;
if the word pair has the high co-occurrency in the text corpus, determining a head word using a tagged corpus, and checking the validity of the word pair using violation constraints, wherein the tagged corpus specifies actual head words for sentences relevant to scheduling information in the text corpus and contains dependencies for all other words with respect to the actual head words, and the violation constraints specify illegal dependency structures;
if the word pair is a valid dependency pair, computing a probability of the word pair, adding the word pair as a dependency pair to the dependency database, and adding the probability of the dependency pair to the dependency database, wherein the probability of the dependency pair corresponds to a frequency of the word pair in the text corpus; and
repeating the above steps until no new dependency pairs are identified.
1 Assignment
0 Petitions
Accused Products
Abstract
A processor is connected to a storage device storing an incoming e-mail, a dependency database, and code for a calendar application. The dependency database can be built from an e-mail corpus containing a plurality of natural language e-mails containing scheduling information. The incoming e-mail is parsed by the processor to build a dependency tree containing word pairs from the e-mail that are found in the dependency tree. The word pairs are stored as dependency pairs in a tree structure in the dependency tree. A probability sum for the dependency tree is calculated to determine if the e-mail contains scheduling information. If the probability sum exceeds a predetermined value, the e-mail is assumed to contain scheduling information and the scheduling information is extracted from the dependency tree and exported to the calendar application.
125 Citations
16 Claims
-
1. A method for computerized extracting of scheduling information from a natural language text for automatic entry into a calendar application, the method comprising the following steps:
-
(a) parsing the natural language text to build a dependency tree by segmenting each sentence in the natural language text into words, building the dependency tree containing dependency pairs by comparing word pairs in the natural language text with a dependency database, and adding the word pairs found in the dependency database as dependency pairs to the dependency tree; (b) determining if the natural language text contains scheduling information by calculating a probability sum for the dependency tree; and (c) if the probability sum exceeds a predetermined value, extracting scheduling information from the dependency tree and exporting the scheduling information to the calendar application; wherein building the dependency database includes the following steps; segmenting each sentence in a text corpus into words, wherein the text corpus contains a plurality of sample natural language texts containing scheduling information; for each sentence in the text corpus, checking all possible combinations of word pairs to determine if the word pair has a high co-occurrency in the text corpus; if the word pair has the high co-occurrency in the text corpus, determining a head word using a tagged corpus, and checking the validity of the word pair using violation constraints, wherein the tagged corpus specifies actual head words for sentences relevant to scheduling information in the text corpus and contains dependencies for all other words with respect to the actual head words, and the violation constraints specify illegal dependency structures; if the word pair is a valid dependency pair, computing a probability of the word pair, adding the word pair as a dependency pair to the dependency database, and adding the probability of the dependency pair to the dependency database, wherein the probability of the dependency pair corresponds to a frequency of the word pair in the text corpus; and repeating the above steps until no new dependency pairs are identified. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A personal organization apparatus comprising:
-
a processor for executing code in the personal organization apparatus; and a storage unit connected to the processor for storing data used by the processor including a natural language text, the storage unit including a dependency database, the dependency database specifying a plurality of dependency pairs and a corresponding probability of each dependency pair, each dependency pair being a word pair found in a text corpus, the probability of the dependency pair corresponding to a frequency of the word pair in the text corpus, and the text corpus including a plurality of sample natural language texts containing scheduling information; wherein the processor parses the natural language text to build a dependency tree in the storage unit, determines if the natural language text contains scheduling information by calculating a probability sum for the dependency tree, and if the probability sum exceeds a predetermined value, extracts scheduling information from the dependency tree and exports the scheduling information to a calendar application; the processor also builds the dependency tree in the storage unit containing dependency pairs by comparing word pairs in the natural language text with the dependency database and adding the word pairs found in the dependency database as dependency pairs to the dependency tree, calculates the probability sum for the natural language text by adding up probabilities for all the dependency pairs in the dependency tree, and if the probability sum exceeds a predetermined sum, extracts scheduling information from the dependency tree and exports the scheduling information to the calendar application; and the processor further builds the dependency database using the text corpus, wherein for each sentence in the text corpus, the processor checks all possible combinations of the word pairs to determine if the word pair has a high co-occurrency in the text corpus;
if the word pair has the high co-occurrency in the text corpus, the processor determines a head word using a tagged corpus, and checks the validity of the word pair using violation constraints, wherein the tagged corpus specifies actual head words for sentences relevant to scheduling information in the text corpus and contains dependencies for all other words with respect to the actual head words, and the violation constraints specify illegal dependency structures; and
if the word pair is a valid dependency pair, the processor determines the frequency of the word pair in the text corpus and adds the word pair as the dependency pair to the dependency database and adds the frequency of the word pair as the probability of the dependency pair to the dependency database. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
Specification