Analyzing textual documents

US 5,323,310 A
Filed: 01/31/1992
Issued: 06/21/1994
Est. Priority Date: 02/14/1991
Status: Expired due to Fees

First Claim

Patent Images

1. A method of processing a first text document in a source language and a second text document in a target language using a computer, each of the documents being divided into segments and stored in memory means of the computer, the segments being further divided into words, the method comprising carrying out the steps of:

a) selecting a first word in the first text document and a second word in the second text document and determining a representation of the probability that the first and second words have substantially the same meaning by taking into account the result of a comparison of the distribution of the first and second words in the segments of the first text document and the second text document respectively; and

b) determining that the first and second words have substantially the same meaning when the representation of the probability that the first and second words have the same meaning is greater than a threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Two text documents, one of which is a (possibly erroneous) translation of the other are stored in the memory of a text processing system. The distributions and forms of words within the documents are compared and by a statistical process it is determined which words are likely to be correctly translated. A list of those words which seem to have been inconsistently translated is compiled.

35 Citations

View as Search Results

5 Claims

1. A method of processing a first text document in a source language and a second text document in a target language using a computer, each of the documents being divided into segments and stored in memory means of the computer, the segments being further divided into words, the method comprising carrying out the steps of:
- a) selecting a first word in the first text document and a second word in the second text document and determining a representation of the probability that the first and second words have substantially the same meaning by taking into account the result of a comparison of the distribution of the first and second words in the segments of the first text document and the second text document respectively; and
  
  b) determining that the first and second words have substantially the same meaning when the representation of the probability that the first and second words have the same meaning is greater than a threshold.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method as claimed in claim 1, wherein the first word in the first text document is regarded to have substantially the same meaning as the second word in the second text document if:
    - a) the representation of the probability of having substantially the same meaning as the first word in the first text document is greater for the second word in the second text document than for any other word in the second text document;
      
      b) the ratio of the greatest representation of the probability of having substantially the same meaning as a word in the second text document to the next greatest such representation is greater for the first word in the first text document than for any other word in the first text document; and
      
      c) that ratio is greater than one.
  - 3. A method as claimed in claim 1 or claim 2, wherein each segment in the first text document corresponds to a single segment in the second text document and the representation of the probability that the first word in the first text document and the second word in the second test document have substantially the same meaning is determined by a process comprising determining the number of time that words identical to the first word are found in segments in the first text document which correspond to segments in the second text document containing words of the same form as the second word.
  - 4. A method as claimed in claim 3, wherein a word is of the same form as the second word only when a word is identical to the second word.
  - 5. A method as claimed in claim 3, wherein a word is of the same form as the second word when the word has a group of at least one or more consecutive letters in common with the second word, and the process also comprises determining the length of the group of letters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The British and Foreign Bible Society
Original Assignee
The British and Foreign Bible Society
Inventors
Robinson, David W. C.
Primary Examiner(s)
Envall, Jr., Roy N.
Assistant Examiner(s)
Shingala, Gita

Application Number

US07/830,027
Time in Patent Office

872 Days
Field of Search

364/419, 364/419.02, 364/419.05
US Class Current

704/2
CPC Class Codes

G06F 40/242   Dictionaries

G06F 40/44   Statistical methods, e.g. p...

G06F 40/58   Use of machine translation,...

Analyzing textual documents

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

35 Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Analyzing textual documents

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

35 Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links