Method and system for processing text
First Claim
Patent Images
1. A method for text processing, comprising:
- determining a plurality of characters in a text, wherein the text comprises double-byte coded characters;
determining whether a number of bytes included in each text segment is even or odd;
detecting which of the plurality of characters represent punctuations;
dividing the text into a plurality of different text segments using the detected punctuations as separators between the different text segments; and
performing a plurality of discrete decoding operations, one for each of the plurality of different text segments, wherein one or more of the plurality of different text segments comprise at least one occurrence of unrecognizable codes that are unable to be successfully decoded as comprehensible characters without inferences being made, wherein decoding operations on text segments lacking unrecognizable codes are unaffected by other decoding operations on text segments including unrecognizable codes; and
when performing the plurality of discrete decoding operations and only when the number of word segments included in one of the text segments is odd, decoding from a head of the text segment rearward, as a first decoding result of the text segment, and decoding from a tail of the text segment frontward, as a second decoding result of the text segment.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a method and system for text processing. The method comprises determining at least a part of characters in a text; dividing the text into a plurality of text segments by using the at least a part of characters as separators; and decoding the plurality of text segments respectively.
12 Citations
18 Claims
-
1. A method for text processing, comprising:
-
determining a plurality of characters in a text, wherein the text comprises double-byte coded characters; determining whether a number of bytes included in each text segment is even or odd; detecting which of the plurality of characters represent punctuations; dividing the text into a plurality of different text segments using the detected punctuations as separators between the different text segments; and performing a plurality of discrete decoding operations, one for each of the plurality of different text segments, wherein one or more of the plurality of different text segments comprise at least one occurrence of unrecognizable codes that are unable to be successfully decoded as comprehensible characters without inferences being made, wherein decoding operations on text segments lacking unrecognizable codes are unaffected by other decoding operations on text segments including unrecognizable codes; and when performing the plurality of discrete decoding operations and only when the number of word segments included in one of the text segments is odd, decoding from a head of the text segment rearward, as a first decoding result of the text segment, and decoding from a tail of the text segment frontward, as a second decoding result of the text segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for text processing, comprising:
-
a character determining module for determining a plurality of characters in a text and for detecting which of the plurality of characters represent punctuations, wherein the text comprises at least one of double-byte coded characters and multi-byte coded characters; a text segment dividing module for dividing the text into a plurality of different text segments using the punctuations detected by the character determination module as separators between the different text segments wherein the text segment dividing module is further configured to; determine a first part and a second part of punctuations in the text; divide the text in into a first segment based on the first part of the punctuations and a second segment based on the second part of punctuations; and a decoding module for performing a plurality of discrete decoding operations on the text, one for each of the plurality of different text segments, wherein one or more of the plurality of different text segments comprises at least one occurrence of unrecognizable codes that are unable to be successfully decoded as comprehensible characters without inferences being made, wherein decoding operations on text segments lacking unrecognizable codes are unaffected by other decoding operations on text segments including unrecognizable codes, wherein the decoding module is further configured to; decode the first segment to obtain a first decoding result of the text, decode the second segment to obtain a second decoding result; and compare the first decoding result to the second decoding result to determine a decoding difference. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A method for text processing, comprising:
-
determining a plurality of characters in a text, wherein the text comprises at least one of double-byte coded characters and multi-byte coded characters; detecting which of the plurality of characters represent punctuations; determining a first part and a second part of punctuations in the text; dividing the text into a first segment based on the first part of the punctuations and a second segment based on the second part of punctuations; dividing the text into a plurality of different text segments using the detected punctuations as separators between the different text segments; and performing a plurality of discrete decoding operations, one for each of the plurality of different text segments, wherein one or more of the plurality of different text segments comprise at least one occurrence of unrecognizable codes that are unable to be successfully decoded as comprehensible characters without inferences being made, wherein decoding operations on text segments lacking unrecognizable codes are unaffected by other decoding operations on text segments including unrecognizable codes, wherein the performing of the discrete coding operations comprises; decoding the first segment to obtain a first decoding result of the text; decoding the second segment to obtain a second decoding result; and comparing the first decoding result to the second decoding result to determine a decoding difference.
-
Specification