Method and apparatus for mapping multiple-byte characters to unique strings of ASCII characters for use in text retrieval
First Claim
Patent Images
1. A method for preparing language text to be used by a text processing system, where said language comprises more than 256 characters, said method comprising the computer implemented steps of:
- a) capturing an input stream of characters which represent said language;
b) separating said input stream of characters into strings of characters which represent words;
said separation into words being accomplished by using rules of grammar of said language to determine where to insert word separators into said stream of characters to delimit groups of said characters which constitute words in said language, and whereby no special characters in said input stream of characters is required to distinguish one word or character set from another;
c) mapping said character strings into unique sets of single byte ASCII characters; and
d) transferring said unique sets of single byte ASCII characters which represent words to said text processing system for further processing.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and method for converting a non-English language document text or search and retrieval argument into a form which can be processed by an existing ASCII based automated text processing system, even though the non-English language may have thousands of characters in it, thereby allowing the use of existing text processing systems and existing text data bases without the need to convert these text processing systems to handle multi-byte character languages.
102 Citations
13 Claims
-
1. A method for preparing language text to be used by a text processing system, where said language comprises more than 256 characters, said method comprising the computer implemented steps of:
-
a) capturing an input stream of characters which represent said language; b) separating said input stream of characters into strings of characters which represent words;
said separation into words being accomplished by using rules of grammar of said language to determine where to insert word separators into said stream of characters to delimit groups of said characters which constitute words in said language, and whereby no special characters in said input stream of characters is required to distinguish one word or character set from another;c) mapping said character strings into unique sets of single byte ASCII characters; and d) transferring said unique sets of single byte ASCII characters which represent words to said text processing system for further processing. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An apparatus for preparing language text to be used by a text processing system, where said language comprises more than 256 characters, said apparatus comprising:
-
a) a filter device for capturing an input stream of characters which represent said language comprising; a word falter for separating said input stream of characters into strings of characters which represent words, wherein said word filter comprises a grammar analyzer to facilitate the separation of said characters into strings of said characters which represent words, wherein no special characters in said input stream of characters is required to distinguish one word or character set from another, and wherein character strings comprising more than two bytes of data art called compound words; b) a mapping device coupled to said word filter, for mapping said strings of characters which represent words into unique strings of single-byte ASCII characters; and c) an output device coupled to said mapping device, for passing said unique strings of single-byte ASCII characters which represent words to said text processor. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. In a computer system comprising an ASCII based text processing system, a method for preparing language text to be used by said ASCII based text processing system, where said language comprise more than 256 characters, said method comprising the step of:
-
a) translating said language text into hexadecimal character strings; b) mapping said hexadecimal character strings into unique sets of single byte ASCII characters; c) inserting word separators in said hexadecimal character strings to delimit groups of said hexadecimal characters which consitute words in said language wherein the rules of grammar or said language are used to determine where to insert said word separators into said hexadecimal character strings, and wherein no special character in said input stream of characters is required to distinguish one word or character set from another; and d) transferring said unique sets of single byte ASCII characters to said ASCII based text processing system for further processing. - View Dependent Claims (13)
-
Specification