Method for standardizing phrasing in a document
First Claim
1. A method of extracting phrases in a document, which comprise the steps of:
- extracting phrases of a document to automatically create a preliminary list of extracted phrases;
filtering the preliminary list of extracted phrases to create a final list of extracted phrases;
extracting candidate phrases of the document which are similar to extracting phrases contained in the final list of extracted phrases;
confirming whether a candidate phrase of the document is sufficiently proximate to the extracted phrase to constitute an approximate phrase by calculating an edit distance of the candidate phrases based on two distinct cost functions, a first one relating to a semantic significance and role of a text of the document, and a second one elating to operations performed on the text of the document; and
computing a phrase substitution to determine the appropriate conformation of one of the extracted phrase to the approximate phrase and the approximate phrase to the extracted phrase.
6 Assignments
0 Petitions
Accused Products
Abstract
A method for standardizing phrases in a document includes the steps of identifying phrases of a document to create a preliminary list of standard phrases; filtering the preliminary list of standard phrases to create a final list of standard phrases; identifying candidate phrases of the document which are similar to the standard phrases; confirming whether a candidate phrase of the document is sufficiently proximate to the standard phrase to constitute an approximate phrase; and computing a phrase substitution to determine the appropriate conformation of standard phrase to the approximate phrase or the approximate phrase to the standard. Further this invention relates to a computer system for standardizing a document.
278 Citations
20 Claims
-
1. A method of extracting phrases in a document, which comprise the steps of:
-
extracting phrases of a document to automatically create a preliminary list of extracted phrases; filtering the preliminary list of extracted phrases to create a final list of extracted phrases; extracting candidate phrases of the document which are similar to extracting phrases contained in the final list of extracted phrases; confirming whether a candidate phrase of the document is sufficiently proximate to the extracted phrase to constitute an approximate phrase by calculating an edit distance of the candidate phrases based on two distinct cost functions, a first one relating to a semantic significance and role of a text of the document, and a second one elating to operations performed on the text of the document; and computing a phrase substitution to determine the appropriate conformation of one of the extracted phrase to the approximate phrase and the approximate phrase to the extracted phrase. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification