Short case name generating method and apparatus

US 5,410,475 A
Filed: 04/19/1993
Issued: 04/25/1995
Est. Priority Date: 04/19/1993
Status: Expired due to Term

First Claim

Patent Images

1. Apparatus for transforming a first character string made up of character groups into a second character string format, said apparatus comprising:

a memory;

processor means, including;

means for storing the first character string in said memory;

tokenizing means for converting each character group into a low-level token in accordance with a first set of rules;

second tokenizing means, applying a second set of rules to said low-level tokens, for creating high-level tokens, wherein each of said high-level tokens represents one of;

a low-level token and a consolidation of low-level tokens;

culling means, applying a third set of rules to said high-level tokens, for selecting which of the high-level tokens represent character groups that should be included in the second character string format; and

means for converting, through application of a fourth set of rules, the high-level tokens selected by said culling means into the second character string format.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A short case name generator transforms the long case name of a lawsuit into a short case name format. The text of the long case name is converted to low-level tokens using dictionaries and heuristic rules. Selected tokens are eliminated and other selected tokens are consolidated into higher level tokens. Each of a sequence of stages receives the output tokens from the preceding stage and produce tokens at a higher level of abstraction. Ultimately, the highest level tokens are produced. Selected high-level tokens are deleted and the surviving tokens are broken down to their component tokens, selected ones of which are also deleted. Next, the surviving tokens are converted back into the text they represent. Editing rules are then applied to that text which results in the short case name format.

67 Citations

View as Search Results

56 Claims

1. Apparatus for transforming a first character string made up of character groups into a second character string format, said apparatus comprising:
- a memory;
  
  processor means, including;
  
  means for storing the first character string in said memory;
  
  tokenizing means for converting each character group into a low-level token in accordance with a first set of rules;
  
  second tokenizing means, applying a second set of rules to said low-level tokens, for creating high-level tokens, wherein each of said high-level tokens represents one of;
  
  a low-level token and a consolidation of low-level tokens;
  
  culling means, applying a third set of rules to said high-level tokens, for selecting which of the high-level tokens represent character groups that should be included in the second character string format; and
  
  means for converting, through application of a fourth set of rules, the high-level tokens selected by said culling means into the second character string format.
- View Dependent Claims (2)
- - 2. The apparatus of claim 1, further comprising a first dictionary of character groups stored in said memory means, and wherein said tokenizing means includes means for searching said dictionary for a low-level token corresponding to one or more of the character groups in said first character string.

3. A method for using a processor to transform a first character string comprised of character groups into a second character string format, said method comprising the steps of:
- the processor storing the first character string in a storage area;
  
  the processor converting each character group into a low-level token in accordance with a first set of rules;
  
  the processor applying a second set of rules to said low-level tokens, for creating high-level tokens, wherein each of said high-level tokens represents one of;
  
  a low-level token and a consolidation of low-level tokens;
  
  the processor applying a third set of rules to said high-level tokens for determining which of the high-level tokens represent character groups that should be included in the second character string format; and
  
  the processor converting, through application of a fourth set of rules, the character group high-level tokens into the second character string format.
- View Dependent Claims (4)
- - 4. The method of claim 3, further comprising the step of:
    - the processor searching a dictionary of character groups for a low-level token corresponding to one or more of the character groups in said first character string.

5. Apparatus for transforming a long case name comprised of words, phrases, and punctuation into a short case name format, said apparatus comprising:
- a memory; and
  
  a processor, including;
  
  means for storing the long case name in said memory;
  
  tokenizing means for converting each word, phrase, and punctuation into a low-level token in accordance with a first set of rules;
  
  second tokenizing means, applying a second set of rules to said low-level tokens, for creating high-level tokens, wherein each of said high-level tokens represents one of;
  
  a low-level token and a consolidation of low-level tokens;
  
  culling means, applying a third set of rules to said high-level tokens, for determining which of the high-level tokens represent text that should be included in the short case name format; and
  
  means for converting, through application of a fourth set of rules, the text-representing high-level tokens into the short case name format.
- View Dependent Claims (6, 7)
- - 6. The apparatus of claim 5, further comprising a first dictionary of words stored in said memory means, and wherein said tokenizing means includes means for searching said dictionary for a low-level token corresponding to one or more words in said long case name.
  - 7. The apparatus of claim 6, further comprising a second dictionary of phrases stored in said memory means, and wherein said tokenizing means includes means for searching said dictionary for a low-level token corresponding to a phrase in said long case name.

8. A computer implemented method for using a processor to transform a long case name made up of words into a short case name format, said method comprising the steps of:
- the processor storing the long case name in a storage area;
  
  the processor converting each word into a low-level token in accordance with a first set of rules;
  
  the processor applying a second set of rules to said low-level tokens, for creating high-level tokens, wherein each of said high-level tokens represents one of;
  
  a low-level token and a consolidation of low-level tokens;
  
  the processor applying a third set of rules to said high-level tokens for determining which of the high-level tokens represent words that should be included in the short case name format; and
  
  the processor converting, through application of a fourth set of rules, the word-representing high-level tokens into the short case name format.
- View Dependent Claims (9, 10)
- - 9. The method of claim 8, further comprising the step of:
    - the processor searching a dictionary of words for a low-level token corresponding to one or more of the words in said long case name.
  - 10. The method of claim 9, further comprising the step of:
    - the processor searching a dictionary of phrases for a low-level token corresponding to one or more groups of the words in said long case name.

11. A computerized short case name generator system which converts a first text string comprised of words to a second text string according to a set of rules, comprising:
- a computer memory containing the first text string and a dictionary of words and their corresponding low-level tokens; and
  
  a processor, including;
  
  tokenization means for creating a low-level token list by converting each word in the first text string to a low-level token by searching said dictionary for a low-level token corresponding to each word in the first text string and applying a first set of rules to convert each word in the first text string not found in said dictionary to a low-level token;
  
  parser means for consolidating selected ones of said low-level tokens in said low-level token list into high-level tokens, for eliminating selected other ones of said low-level tokens and for creating a high-level token list of said high-level tokens;
  
  culling means for selecting which words represented by the high-level tokens will be included in the second text string according to a second set of rules; and
  
  formatting means for converting the selected high-level tokens back into the words they represent to provide the second text string.
- View Dependent Claims (12, 13, 14)
- - 12. The system of claim 11, wherein said computer memory means further includes means for storing a phrase dictionary including phrases comprised of groups of words and low level tokens corresponding to the phrases and said tokenization means further includes means for converting selected groups of words into low level tokens by searching the phrase dictionary for phrases corresponding to low level tokens.
  - 13. The system of claim 12, wherein said formatting means further include means for converting selected ones of said words to corresponding abbreviations and acronyms according to a third set of rules.
  - 14. The system of claim 11, further including flagging means for identifying when the output from the culling means is unreliable.

15. A computer implemented method for using a processor to convert a first text string comprised of words to a second text string according to a rule set, the method comprising the steps of:
- (a) the processor scanning a word-to-token dictionary to determine if any of the words are in said dictionary;
  
  (b) the processor storing each token which corresponds to a word which matches a word in said dictionary to form a set of lower level tokens;
  
  (c) the processor applying a first set of rules to each unmatched word which does not match a word in said dictionary to convert each of said unmatched words to a lower level token according to said first set of rules;
  
  (d) the processor combining selected ones of said lower level tokens into higher level tokens according to a second set of rules;
  
  (e) the processor deleting selected other ones of said lower level tokens according to a third set of rules;
  
  (f) the processor repeating steps (d) and (e) with lower level tokens and higher level tokens until only higher level tokens remain;
  
  (g) the processor deleting selected ones of said higher level tokens according to a fourth set of rules; and
  
  (h) the processor transforming the remaining higher level tokens into the words they represent to provide the second string.
- View Dependent Claims (16)
- - 16. The method of claim 15, including the additional step of:
    - (i) converting a subset of said words provided in step (h) to at least one of;
      
      corresponding abbreviations and acronyms.

17. A computer implemented method for using a processor to track the correlation between a first text string comprised of words and a token list in order to provide a second text string, the method comprising the steps of:
- (a) the processor creating a token list of low level tokens, the low level tokens corresponding to the words in the text sting;
  
  (b) the processor creating an address string which denotes the correspondence between the low level tokens and the words in the text string;
  
  (c) the processor replacing selected ones and groups of the low level tokens in the token list with higher level tokens;
  
  (d) the processor modifying the address line to maintain the correspondence between the higher level tokens and the words the higher level tokens represent; and
  
  (e) the processor converting the higher level tokens back into the words they represent according to the address line to provide the second text string.
- View Dependent Claims (18)
- - 18. The method of claim 17, further including the steps of:
    - the processor deleting selected ones of said higher level tokens; and
      
      the processor modifying the address line to maintain the correspondence between the remaining higher level tokens and the words they represent.

19. A computer implemented method for converting a first text string comprised of words to a second text string according to a rule set, the method comprising the steps of:
- (a) the computer scanning a word-to-token dictionary to determine if any of the words are in said dictionary, the dictionary having many words correspond to each token;
  
  (b) the computer storing each token which corresponds to a word which matches a word in said dictionary;
  
  (c) the computer applying a first set of rules to each unmatched word which does not match a word in said dictionary to convert each of said unmatched words to a token according to said first set of rules;
  
  (d) the computer creating an address string which denotes the correspondence between the tokens and the words in the first text string;
  
  (e) the computer combining selected ones of said tokens into higher level tokens according to a second set of rules;
  
  (f) the computer modifying the address line to denote the correspondence between the higher level tokens and the words they represent;
  
  (g) the computer deleting selected other ones of said tokens according to a third set of rules;
  
  (h) the computer modifying the address line to maintain the correspondence between the remaining tokens and the words they represent; and
  
  (i) the computer repeating steps (e) through (h) with tokens and higher level tokens until only higher level tokens remain wherein said higher level tokens correspond to words of the second text string.
- View Dependent Claims (20)
- - 20. The method of claim 19, including the further steps of:
    - (j) the computer deleting selected ones of said higher level tokens according to a fourth set of rules;
      
      (k) the computer modifying the address line to maintain the correspondence between the remaining higher level tokens and the words they represent; and
      
      (1) the computer transforming the remaining tokens into the words they represent according to the address line.

21. Apparatus for transforming a first character string made up of character groups into a second character string, said apparatus comprising:
- input means, for providing the first character string; and
  
  a processor, including;
  
  identifying means for providing an identification for each character group of said first character string;
  
  consolidation means for consolidating character groups of said first character string, according to identifications of said groups, to provide a consolidated character string; and
  
  culling means, for selecting portions of said consolidated character string for inclusion in the second string and for converting said consolidated character string into said second string.
- View Dependent Claims (22, 23, 24)
- - 22. The apparatus of claim 21, further comprising a first dictionary of character groups that is used by said identifying means to identify character groups of the first character string.
  - 23. The apparatus of claim 22, wherein every character group in the second character string is also in the first character string.
  - 24. The apparatus of claim 22, wherein at least one character group in the second character string is not in the first character string.

25. Apparatus for transforming a long case name comprised of words, phrases, and punctuation into a short case name format, said apparatus comprising:
- input means for providing the long case name; and
  
  a processor, including;
  
  identifying means for providing an identification for character groups of said long case name;
  
  consolidation means for applying a first set of rules to said long case name, according to identifications provided by said identifying means, to provide a consolidated case name;
  
  culling means, responsive to said consolidation means, for applying a second set of rules to said consolidated case name to provide a culled case name; and
  
  means for converting, through application of a third set of rules, the culled case name into the short case name format.
- View Dependent Claims (26, 27, 28, 29)
- - 26. The apparatus of claim 25, further comprising a first dictionary of words that is used by said identifying means to analyze groups of characters in said long case name.
  - 27. The apparatus of claim 26, further comprising a second dictionary of phrases which are used by said identifying means to detect phrases in said long case name.
  - 28. The apparatus of claim 27, wherein every character group in the short case name is also in the long case name.
  - 29. The apparatus of claim 27, wherein at least one character group in the short case name is not in the long case name.

30. A computerized short case name generator system which converts a first text string comprised of words to a second text string according to a set of rules, the system comprising:
- input means for providing the first character string; and
  
  a processor, including;
  
  identifying means for searching a dictionary for words contained in said first text string and for detecting words not found in said dictionary that are in said first text string;
  
  parser means, responsive to said identifying means, for consolidating selected ones of words in said first text string and for eliminating selected other ones of words in said text string to create a consolidated text string;
  
  culling means for selecting which words of said consolidated text string will be included in the second text string according to a first set of rules; and
  
  formatting means for converting the selected words of said consolidated text string into words of the second text string.
- View Dependent Claims (31, 32, 33, 34)
- - 31. The system of claim 30, further comprising:
    - a phrase dictionary containing phrases formed by groups of words wherein said identifying means detects selected groups of words which correspond to phrases in the phrase dictionary.
  - 32. The system of claim 31, wherein said formatting means further includes means for converting a subset of words of said second text string to at least one of:
    - corresponding abbreviations and acronyms according to a second set of rules.
  - 33. The system of claim 32, wherein every word in the second text string is also in the first text string.
  - 34. The system of claim 32, wherein at least one word in the second text string is not in the first text string.

35. Apparatus for transforming a first character string made up of character groups into a second character string, said apparatus comprising:
- input means for providing said first character string; and
  
  a processor, including;
  
  identifying means for providing an identification for each character group of said first character string;
  
  first means for consolidating character groups of said first character string, according to identifications of said groups, and for selecting portions thereof for inclusion in the second string; and
  
  second means, responsive to said first means, for providing said second character string.
- View Dependent Claims (36, 37, 38)
- - 36. The apparatus of claim 35, further comprising a first dictionary of character groups that is used by said identifying means to identify character groups of the first character string.
  - 37. The apparatus of claim 36, wherein every character group in the second character string is also in the first character string.
  - 38. The apparatus of claim 36, wherein at least one character group in the second character string is not in the first character string.

39. A method for using a processor to transform a first character string made up of character groups into a second character string, the method comprising the steps of:
- the processor providing an identification for each character group of said first character string;
  
  the processor consolidating character groups of said first character string, according to identifications of said groups, to provide a consolidated character string; and
  
  the processor selecting portions of said consolidated character string for inclusion in the second string and converting said consolidated character string into said second string.
- View Dependent Claims (40, 41, 42)
- - 40. The method of claim 39, wherein said identifying step uses a first dictionary of character groups to identify character groups of the first character string.
  - 41. The method of claim 40, wherein every character group in the second character string is also in the first character string.
  - 42. The method of claim 40, wherein at least one character group in the second character string is not in the first character string.

43. A method for using a processor to transform a long case name comprised of words, phrases, and punctuation into a short case name format, the method comprising the steps of:
- the processor providing an identification for character groups of said long case name;
  
  the processor applying a first set of rules to said long case name, according to said identifications of character groups, to provide a consolidated case name;
  
  the processor applying a second set of rules to said consolidated case name to provide a culled case name; and
  
  the processor converting, through application of a third set of rules, the culled case name into the short case name format.
- View Dependent Claims (44, 45, 46, 47)
- - 44. The method of claim 43, wherein a first dictionary of words is used to analyze groups of characters in said long case name.
  - 45. The method of claim 44, wherein a second dictionary of phrases is used to detect phrases in said long case name.
  - 46. The method of claim 45, wherein every character group in the short case name is also in the long case name.
  - 47. The method of claim 45, wherein at least one character group in the short case name is not in the long case name.

48. A method using a computerized short case name generator system for converting a first text string comprised of words to a second text string according to a set of rules, the method comprising the steps of:
- the system searching a dictionary for words contained in said first text string and for detecting words not found in said dictionary that are in said first text string;
  
  the system consolidating selected ones of words in said first text string and eliminating selected other ones of words in said text string to create a consolidated text string;
  
  the system selecting which words of said consolidated text string will be included in the second text string according to a first set of rules; and
  
  the system converting the selected words of said consolidated text string into words of the second text string.
- View Dependent Claims (49, 50, 51, 52)
- - 49. The method of claim 48, further comprising the step of:
    - the system using a phrase dictionary containing phrases formed by groups of words to detect selected groups of words which correspond to phrases in the phrase dictionary.
  - 50. The method of claim 49, further comprising the step of:
    - the system converting a subset of words of said second text string to at least one of;
      
      corresponding abbreviations and acronyms according to a second set of rules.
  - 51. The method of claim 50, wherein every word in the second text string is also in the first text string.
  - 52. The method of claim 50, wherein at least one word in the second text string is not in the first text string.

53. A method for using a processor to transform a first character string made up of character groups into a second character string, the method comprising the steps of:
- the processor providing an identification for each character group of said first character string;
  
  the processor consolidating character groups of said first character string, according to identifications of said groups, and selecting portions thereof for inclusion in the second string; and
  
  the processor providing said second character string from portions selected for inclusion therein.
- View Dependent Claims (54, 55, 56)
- - 54. The method of claim 53, further comprising the step of:
    - the processor using a first dictionary of character groups to identify character groups of the first character string.
  - 55. The method of claim 54, wherein every character group in the second character string is also in the first character string.
  - 56. The method of claim 54, wherein at least one character group in the second character string is not in the first character string.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RELX Inc. (RELX PLC)
Original Assignee
Mead Data Central, Inc. (RELX PLC)
Inventors
Lu, X. Allan, Klein, Timothy M.
Primary Examiner(s)
Weinhardt, Robert A.

Application Number

US08/047,659
Time in Patent Office

736 Days
Field of Search

364/419.01, 364/419.08, 364/419.10, 364/419.14, 364/419.15, 364/419.17
US Class Current

704/1
CPC Class Codes

G06F 40/295 Named entity recognition

Short case name generating method and apparatus

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

67 Citations

56 Claims

Specification

Solutions

Use Cases

Quick Links

Short case name generating method and apparatus

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

67 Citations

56 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links