System for categorizing character strings using acceptability and category information contained in ending substrings
First Claim
1. An article of manufacture for use in a character recognition system that includes a processor;
- the article comprising;
a data storage medium; and
string data stored by the data storage medium;
the string data comprising two or more data units, each of which can be accessed by a processor of a character recognition system;
the data units including, for each of a set of two or more acceptable strings of characters, a respective sequence of data units that the processor can access using character data indicating character types of the string'"'"'s characters;
the set of acceptable strings including a first string that is in a first subset of categories, each category in the first subset being one of a set of two or more categories;
the first string'"'"'s sequence of data units including a respective ending subsequence of data units that the processor can access at the end of the first string'"'"'s sequence and use to obtain first string ending data indicating that the first string is one of the acceptable strings and indicating the first subset of categories;
the first string'"'"'s ending subsequence including;
acceptance information indicating that a string at the end of whose respective sequence the processor can access the ending subsequence is one of the set of acceptable strings; and
category set information indicating the first subset of categories;
the first subset of categories including at least one of the set of categories.
4 Assignments
0 Petitions
Accused Products
Abstract
A data storage medium stores string data that can be used in character recognition and instructions for accessing the string data. The string data includes data units that can be accessed by a processor in executing the instructions. The processor can use character data indicating characters of a string to access a sequence of the data units that ends with an ending subsequence. The ending subsequence includes acceptance information indicating whether a string whose sequence of data units ends with the ending subsequence is an acceptable string. If so, the ending subsequence also includes category set information indicating a set of categories for strings whose sequences end with the ending subsequence. The categories can include words, numbers, compound words, and so forth. The acceptance information can include a bit in a character label data unit that includes information indicating the character type of an ending character. The acceptance information can also include an acceptance data unit whose value indicates an acceptable string ending. The acceptance data unit can be followed by category data units, each with a value indicating a category. The category data units can be used to obtain a bit vector for a string, each bit of which indicates whether the string is in one of the categories. For compactness, all or part of an ending subsequence can be shared by plural acceptable strings. Looping can be used to represent a category with a potentially infinite number of strings, such as numbers.
63 Citations
23 Claims
-
1. An article of manufacture for use in a character recognition system that includes a processor;
- the article comprising;
a data storage medium; and string data stored by the data storage medium;
the string data comprising two or more data units, each of which can be accessed by a processor of a character recognition system;
the data units including, for each of a set of two or more acceptable strings of characters, a respective sequence of data units that the processor can access using character data indicating character types of the string'"'"'s characters;the set of acceptable strings including a first string that is in a first subset of categories, each category in the first subset being one of a set of two or more categories;
the first string'"'"'s sequence of data units including a respective ending subsequence of data units that the processor can access at the end of the first string'"'"'s sequence and use to obtain first string ending data indicating that the first string is one of the acceptable strings and indicating the first subset of categories;
the first string'"'"'s ending subsequence including;acceptance information indicating that a string at the end of whose respective sequence the processor can access the ending subsequence is one of the set of acceptable strings; and category set information indicating the first subset of categories;
the first subset of categories including at least one of the set of categories. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- the article comprising;
-
15. A system comprising:
-
a processor; memory;
the processor being connected for accessing the memory; andstring data stored by the memory;
the string data comprising two or more data units, each of which can be accessed by the processor;
the data units including, for each of a set of two or more acceptable strings of characters, a respective sequence of data units that the processor can access using character data indicating the string'"'"'s characters;the set of acceptable strings including a first string that is in a first subset of categories, each category in the first subset being one of a set of two or more categories;
the first string'"'"'s sequence of data units including a respective ending subsequence of data units that the processor can access at the end of the first string'"'"'s sequence and use to obtain first string ending data indicating that the first string is one of the acceptable strings and indicating the first subset of categories;
the first string'"'"'s ending subsequence including;acceptance information indicating that a string at the end of whose respective sequence the processor can access the ending subsequence is one of the set of acceptable strings; and category set information indicating the first subset of categories;
the first subset of categories including at least one of the set of categories. - View Dependent Claims (16, 17, 20)
-
-
18. A method of operating a system that includes:
-
a processor;
memory;
the processor being connected for accessing the memory; andstring data stored by the memory;
the string data comprising two or more data units, each of which can be accessed by the processor;
the data units including, for each of a set of two or more acceptable strings of characters, a respective sequence of data units that the processor can access using character data indicating the string'"'"'s characters;the set of acceptable strings including a first string that is in a first subset of categories, each category in the first subset being one of a set of two or more categories;
the first string'"'"'s sequence of data units including a respective ending subsequence of data units that the processor can access at the end of the first string'"'"'s sequence;
the first string'"'"'s ending subsequence including;acceptance information indicating that a string at the end of whose respective sequence the processor can access the ending subsequence is one of the set of acceptable strings; and category set information indicating the first subset of categories;
the first subset of categories including at least one of the set of categories;the method comprising steps of; operating the processor to access the first string'"'"'s sequence using first string character data indicating the first string'"'"'s characters;
the step of operating the processor to access the first string'"'"'s sequence comprising a substep of accessing the first string'"'"'s ending subsequence; andoperating the processor to use the first string'"'"'s ending subsequence to obtain first string ending data indicating that the first string is one of the acceptable strings and indicating the first subset of categories. - View Dependent Claims (19)
-
-
21. A system comprising:
-
a processor; memory;
the processor being connected for accessing the memory; and
string data stored by the memory;
the string data comprising two or more data units, each of which can be accessed by the processor;
the data units including, for each of a set of two or more acceptable strings of characters, a respective sequence of data units that the processor can access using character data indicating the string'"'"'s characters;the set of acceptable strings including first and second strings that are each in a first subset of categories, each category in the first subset being one of a set of two or more categories;
the first string'"'"'s sequence of data units including a respective ending subsequence of data units that the processor can access at the end of the first string'"'"'s sequence and use to obtain first string ending data indicating that the first string is one of the acceptable strings and indicating the first subset of categories;
the second string'"'"'s sequence of data units including a respective ending subsequence of data units that the processor can access at the end of the second string'"'"'s sequence and use to obtain second string ending data indicating that the second string is one of the acceptable strings and indicating the first subset of categories;
the first and second strings'"'"'ending subsequences both including a shared ending subsequence;
the shared ending subsequence including;acceptance information indicating that a string at the end of whose respective sequence the processor can access the shared ending subsequence is one of the set of acceptable strings; and category set information indicating the first subset of categories;
the first subset of categories including at least one of the set of categories.
-
-
22. A system comprising:
-
a processor; memory;
the processor being connected for accessing the memory; andstring data stored by the memory;
the string data comprising two or more data units, each of which can be accessed by the processor;
the data units including, for each of a set of two or more acceptable strings of characters, a respective sequence of data units that the processor can access using character data indicating the string'"'"'s characters;the set of acceptable strings including a first string that is in a first subset of two or more categories, each category in the first subset being one of a set of two or more categories;
the first string'"'"'s sequence of data units including a respective ending subsequence of data units that the processor can access at the end of the first string'"'"'s sequence and use to obtain first string ending data indicating that the first string is one of the acceptable strings and indicating the first subset of categories;
the first string'"'"'s ending subsequence including;acceptance information indicating that a string at the end of whose respective sequence the processor can access the ending subsequence is one of the set of acceptable strings; and category set information indicating the first subset of categories.
-
-
23. A system comprising:
-
a processor; memory;
the processor being connected for accessing the memory;instruction data stored by the memory, the instruction data indicating instructions the processor can execute; and string data stored by the memory;
the string data comprising two or more data units, each of which can be accessed by the processor;
the data units including, for each of a set of two or more acceptable strings of characters, a respective sequence of data units that the processor can access using character data indicating the string'"'"'s characters;the set of acceptable strings including a first string that is in a first subset of categories, each category in the first subset being one of a set of two or more categories;
the first string'"'"'s sequence of data units including a respective ending subsequence of data units that the processor can access at the end of the first string'"'"'s sequence;
the first string'"'"'s ending subsequence including;acceptance information indicating that a string at the end of whose respective sequence the processor can access the shared ending subsequence is one of the set of acceptable strings; and category set information indicating the first subset of categories;
the first subset of categories including at least one of the set of categories;the processor, in executing the instructions; using the first string'"'"'s characters to access the first string'"'"'s sequence of data units; and accessing the first string'"'"'s ending subsequence at the end of the first string'"'"'s sequence of data units and using the first string'"'"'s ending subsequence to obtain first string ending data;
the first string ending data indicating that the first string is one of the acceptable strings and indicating the first subset of categories.
-
Specification