Method, system, and program for generating a deterministic table to determine boundaries between characters
First Claim
Patent Images
1. A method for generating a table for use by a computer in determining a location of boundaries in text, comprising:
- initializing the table by defining columns in the table;
processing at least one regular expression;
processing input characters in the at least one regular expression to determine at least one transition to at least one state by;
(i) indicating one row as a decision point;
(ii) receiving an input character;
(iii) adding a new row to the table for the input character; and
(iv) setting an input column corresponding to the input character in at least one decision point row to point to a row number of the added new row determining whether one input character would cause a transition to multiple states; and
adding additional states to the table to transform the transition to multiple states to a deterministic transition.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a system, method, and program for generating a data structure for use by a computer in determining a location of boundaries in text. The data structure is initialized and at least one regular expression is processed. Input characters in the at least one regular expression are then processed to determine at least one transition to at least one state. A determination is then made as to whether one input character would cause a non-deterministic transition. Additional states are added to the data structure to transform the non-deterministic transition to a deterministic transition.
-
Citations
36 Claims
-
1. A method for generating a table for use by a computer in determining a location of boundaries in text, comprising:
-
initializing the table by defining columns in the table;
processing at least one regular expression;
processing input characters in the at least one regular expression to determine at least one transition to at least one state by;
(i) indicating one row as a decision point;
(ii) receiving an input character;
(iii) adding a new row to the table for the input character; and
(iv) setting an input column corresponding to the input character in at least one decision point row to point to a row number of the added new row determining whether one input character would cause a transition to multiple states; and
adding additional states to the table to transform the transition to multiple states to a deterministic transition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
using data structures to indicate states capable of transitioning to multiple states;
updating one state having transitions to multiple states to point to a new state providing deterministic transitions to the multiple states.
-
-
4. The method of claim 1, wherein defining the table having columns further comprises processing the regular expressions to determine categories of characters that each form one of the columns in the table.
-
5. The method of claim 4, wherein processing the regular expressions comprises forming categories of characters where no subset of the categories intersects any other category of characters.
-
6. The method of claim 1, wherein the new row is a source row and a target cell comprises a target cell in the decision point row in the input column, wherein determining whether one input character would cause transition to multiple states comprises determining whether the target cell is empty, wherein the target cell is set to point to a number of the added new row if the target cell is not empty;
- and wherein adding additional states comprises executing a merge routine to add additional states to have the state represented by the target cell point to the state represented by the row pointed to by the target cell and the state represented by the source in a deterministic manner.
-
7. The method of claim 6, wherein execution of the merge routine comprises:
-
adding an additional new row to the table if the target cell is not empty; and
merging the content of the row pointed to by the target cell and the source row into the additional new row if the target cell is not empty.
-
-
8. The method of claim 7, wherein merging the content into the added new row further comprises:
-
copying the content of the row pointed to by the target cell to the additional new row to form a new target row;
copying, for each column, the content of the source row into the column in the new target row if the column in the new target row is empty; and
recursively performing the merge routine if one column in the new target row is not empty.
-
-
9. The method of claim 1, further comprising performing, upon determining that a character following the input character is a repeat character:
-
indicating duplication of a pre-repeat decision point row pointing to a character before the character that is before the repeat character; and
setting a value in the column corresponding to the character before the repeat in the pre-repeat decision point row to point to a row that has a value in the column corresponding to a character following the repeat character and in the column corresponding to the character before the repeat character.
-
-
10. The method of claim 1, further comprising performing, upon determining that the input character indicates the beginning of an optional expression:
-
indicating duplication of a pre-optional expression decision point row for a character before a beginning of the optional expression; and
setting a value in the column corresponding to the character following the optional expression in the pre-optional expression decision point row to point to a row that has a value in the column corresponding to a character following the optional expression and in the column corresponding to a first character within the optional expression.
-
-
11. A method for generating a table for use by a computer in determining a location of boundaries in text, comprising:
-
initializing the table by defining columns in the table;
processing at least one regular expression;
processing input characters in the at least one regular expression to determine at least one transition to at least one state by;
(i) indicating one row as a decision point;
(ii) receiving an input character;
(iii) adding a new row to the table for the input character; and
(iv) setting an input column corresponding to the input character in at least one decision point row to point to a row number of the added new row;
determining whether one input character would cause a transition to multiple states; and
adding additional states to the table to transform the transition to multiple states to a deterministic transition;
determining whether the input character is a special character; and
indicating duplicates of the decision point row to allow for multiple sequences of characters from one decision point.
-
-
12. A method for generating a table for use by a computer in determining a location of boundaries in text, comprising:
-
initializing the table by defining columns in the table;
processing at least one regular expression;
processing input characters in the at least one regular expression to determine at least one transition to at least one state by;
(i) indicating one row as a decision point;
(ii) receiving an input character;
(iii) adding a new row to the table for the input character; and
(iv) setting an input column corresponding to the input character in at least one decision point row to point to a row number of the added new row;
determining whether one input character would cause a transition to multiple states; and
adding additional states to the table to transform the transition to multiple states to a deterministic transition; and
performing, upon determining that the input character indicates the beginning of an alternative expression, indicating duplicates of the decision point row to allow for alternative sequences of characters from one decision point.
-
-
13. A computer system for generating a table for determining a location of boundaries in text, comprising:
-
means for initializing the table by defining columns in the table;
means for processing at least one regular expression means for processing input characters in the at least one regular expression to determine at least one transition to at least one state;
(i) indicating one row as a decision point;
(ii) receiving an input character;
(iii) adding a new row to the table for the input character; and
(iv) setting an input column corresponding to the input character in at least one decision point row to point to a row number of the added new row means for determining whether one input character would cause a transition to multiple states; and
means for adding additional states to the table to transform the transition to multiple states to a deterministic transition. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
means for using data structures to indicate states capable of transitioning to multiple states;
means for updating one state having transitions to multiple states to point to a new state providing deterministic transitions to the multiple states.
-
-
16. The computer system of claim 13, wherein the means for defining the table having columns further processes the regular expressions to determine categories of characters that each form one of the columns in the table.
-
17. The computer system of claim 16, wherein the means for processing the regular expressions forms categories of characters where no subset of the categories intersects any other category of characters.
-
18. The computer system of claim 13, wherein the new row is a source row and a target cell comprises a target cell in the decision point row in the input column, wherein the means for determining whether one input character would cause a transition to multiple states determines whether the target cell is empty, wherein the target cell is set to point to a number of the added new row if the target cell is not empty;
- and wherein the means for adding additional states executes a merge routine to add additional states to have the state represented by the target cell point to the state represented by the row pointed to by the target cell and the state represented by the source in a deterministic manner.
-
19. The computer system of claim 18, wherein execution of the merge routine comprises:
-
adding an additional new row to the table if the target cell is not empty; and
merging the content of the row pointed to by the target cell and the source row into the additional new row if the target cell is not empty.
-
-
20. The computer system of claim 19, wherein merging the content into the added new row further comprises:
-
copying the content of the row pointed to by the target cell to the additional new row to form a new target row;
copying, for each column, the content of the source row into the column in the new target row if the column in the new target row is empty; and
recursively performing the merge routine if one column in the new target row is not empty.
-
-
21. The computer system of claim 13, further comprising means for performing, upon determining that a character following the input character is a repeat character:
-
indicating duplication of a pre-repeat decision point row pointing to a character before the character that is before the repeat character; and
setting a value in the column corresponding to the character before the repeat in the pre-repeat decision point row to point to a row that has a value in the column corresponding to a character following the repeat character and in the column corresponding to the character before the repeat character.
-
-
22. The computer system of claim 13, further comprising mean for performing, upon determining that the input character indicates the beginning of an optional expression:
-
indicating duplication of a pre-optional expression decision point row for a character before a beginning of the optional expression; and
setting a value in the column corresponding to the character following the optional expression in the pre-optional expression decision point row to point to a row flat has a value in the column corresponding to a character following the optional expression and in the column corresponding to a first character within the optional expression.
-
-
23. A computer system for generating a table for determining a location of boundaries in text, comprising:
-
means for initializing the table by defining columns in the table;
means for processing at least one regular expression;
means for processing input characters in the at least one regular expression to determine at least one transition to at least one state by;
(i) indicating one row as a decision point;
(ii) receiving an input character;
(iii) adding a new row to the table for the input character; and
(iv) setting an input column corresponding to the input character in at least one decision point row to point to a row number of the added new row;
means for determining whether one input character would cause a transition to multiple states;
means for adding additional states to the table to transform the transition to multiple states to a deterministic transition;
means for determining whether the input character is a special character; and
means for indicating duplicates of the decision point row to allow for multiple sequences of characters from one decision point.
-
-
24. A computer system for generating a table for determining a location of boundaries in text, comprising:
-
means for initializing the table by defining columns in the table;
means for processing at least one regular expression;
means for processing input characters in the at least one regular expression to determine at least one transition to at least one state by;
(i) indicating one row as a decision point;
(ii) receiving an input character;
(iii) adding a new row to the table for the input character; and
(iv) setting an input column corresponding to the input character in at least one decision point row to point to a row number of the added new row;
means for determining whether one input character would cause a transition to multiple states;
means for adding additional states to the table to transform the transition to multiple states to a deterministic transition;
means for determining whether the input character is a special character;
means for indicating duplicates of the decision point row to allow for multiple sequences of characters from one decision point; and
means for performing, upon determining that the input character indicates the beginning of an alternative expression, indicating duplicates of the decision point row to allow for alternative sequences of characters from one decision point.
-
-
25. An article of manufacture for generating a table for use by a computer in determining a location of boundaries in text, the article of manufacture comprising a computer usable medium including at least one computer program that causes the computer to perform:
-
initializing the table by defining columns in the table;
processing at least one regular expression;
processing input characters in the at least one regular expression to determine at least one transition to at least one state by;
(i) indicating one row as a decision point;
(ii) receiving an input character;
(iii) adding a new row to the table for the input character; and
(iv) setting an input column corresponding to the input character in at least one decision point row to point to a row number of the added new row determining whether one input character would cause a transition to multiple states; and
adding additional states to the table to transform the transition to multiple states to a deterministic transition. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34)
using data structures to indicate states capable of transitioning to multiple states;
updating one state having transitions to multiple states to point to a new state providing deterministic transitions to the multiple states.
-
-
28. The article of manufacture of claim 25, wherein defining the table having columns further comprises processing the regular expressions to determine categories of characters that each form one of the columns in the table.
-
29. The article of manufacture of claim 28, wherein processing the regular expressions comprises forming categories of characters where no subset of the categories intersects any other category of characters.
-
30. The article of manufacture of claim 25, wherein the new row is a source row and a target cell comprises a target cell in the decision point row in the input column, wherein determining whether one input character would cause transition to multiple states comprises determining whether the target cell is empty, wherein the target cell is set to point to a number of the added new row if the target cell is not empty;
- and wherein adding additional states comprises executing a merge routine to add additional states to have the state represented by the target cell point to the state represented by the row pointed to by the target cell and the state represented by the source in a deterministic manner.
-
31. The article of manufacture of claim 30, wherein execution of the merge routine performs;
-
adding an additional new row to the table if the target cell is not empty; and
merging the content of the row pointed to by the target cell and the source row into the additional new row if the target cell is not empty.
-
-
32. The article of manufacture of claim 31, wherein merging the content into the added new row further comprises:
-
copying the content of the row pointed to by the target cell to the additional new row to form a new target row;
copying, for each column, the content of the source row into the column in the new target row if the column in the new target row is empty; and
recursively performing the merge routine if one column in the new target row is not empty.
-
-
33. The article of manufacture of claim 25, further comprising performing, upon determining that a character following the input character is a repeat character:
-
indicating duplication of a pre-repeat decision point row pointing to a character before the character that is before the repeat character; and
setting a value in the column corresponding to the character before the repeat in the pre-repeat decision point row to point to a row that has a value in the column corresponding to a character following the repeat character and in the column corresponding to the character before the repeat character.
-
-
34. The article of manufacture of claim 25, further comprising performing, upon determining that the input character indicates the beginning of an optional expression:
-
indicating duplication of a pre-optional expression decision point row for a character before a beginning of the optional expression; and
setting a value in the column corresponding to the character following the optional expression in the pre-optional expression decision point row to point to a row that has a value in the column corresponding to a character following the optional expression and in the column corresponding to a first character within the optional expression.
-
-
35. An article of manufacture for generating a table for use by a computer in determining a location of boundaries in text, the article of manufacture comprising a computer usable medium including at least one computer program that causes the computer to perform:
-
initializing the table by defining columns in the table;
processing at least one regular expression;
processing input characters in the at least one regular expression to determine at least one transition to at least one state by;
(i) indicating one row as a decision point;
(ii) receiving an input character;
(iii) adding a new row to the table for the input character; and
(iv) setting an input column corresponding to the input character in at least one decision point row to point to a row number of the added new row;
determining whether one input character would cause a transition to multiple states;
adding additional states to the table to transform the transition to multiple states to a deterministic transition;
determining whether the input character is a special character; and
indicating duplicates of the decision point row to allow for multiple sequences of characters from one decision point.
-
-
36. An article of manufacture for generating a table for use by a computer in determining a location of boundaries in text, the article of manufacture comprising a computer usable medium including at least one computer program that causes the computer to perform:
-
initializing the table by defining columns in the table;
processing at least one regular expression;
processing input characters in the at least one regular expression to determine at least one transition to at least one state by;
(i) indicating one row as a decision point;
(ii) receiving an input character;
(iii) adding a new row to the table for the input character; and
(iv) setting an input column corresponding to the input character in at least one decision point row to point to a row number of the added new row;
determining whether one input character would cause a transition to multiple states;
adding additional states to tie table to transform the transition to multiple states to a deterministic transition; and
performing, upon determining that the input character indicates the beginning of an alternative expression, indicating duplicates of the decision point row to allow for alternative sequences of characters from one decision point.
-
Specification