System and method for tokening documents
First Claim
1. A method of tokenizing a document the method comprising:
- receiving at least a portion of a document, the portion comprising at least one character;
assigning the at least one character to at least one of a plurality of character classes; and
concurrently performing a plurality of comparisons defined by at least one instruction wherein performing at least one of the plurality of comparisons comprises comparing the assigned character classes to an operand of the instruction;
selecting at least one of a plurality of executable actions based on at least one result of performing the plurality of comparisons;
executing the at least one of a plurality of executable actions; and
storing tokenizing state information to a memory based on executing the action.
6 Assignments
0 Petitions
Accused Products
Abstract
A system for tokenizing a document, such as, for example, an XML document. A classifier is configured to assign the at least one character to at least one of a plurality of character classes. Each of a plurality of token logic units is configured to concurrently perform a comparison as specified by an instruction. A comparison may comprise comparing the at least one character class to an operand. An execution unit is configured to select an action from the instruction in response to performing the comparisons and to perform the action. A method of tokenizing a document includes assigning at least one character from a document to at least one of a plurality of character classes and concurrently performing a plurality of comparisons. At least one of the plurality of comparisons comprises comparing the assigned character class to the character from the document. At least one action to be performed is selected based on at least one result produced by performing the comparisons, and the selected action is subsequently performed.
22 Citations
21 Claims
-
1. A method of tokenizing a document the method comprising:
-
receiving at least a portion of a document, the portion comprising at least one character; assigning the at least one character to at least one of a plurality of character classes; and concurrently performing a plurality of comparisons defined by at least one instruction wherein performing at least one of the plurality of comparisons comprises comparing the assigned character classes to an operand of the instruction; selecting at least one of a plurality of executable actions based on at least one result of performing the plurality of comparisons; executing the at least one of a plurality of executable actions; and storing tokenizing state information to a memory based on executing the action. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for tokenizing a document the system comprising:
-
a memory configured to store at least a portion of the document, the portion comprising at least one character, wherein the memory is further configured to store a plurality of instructions wherein each of the plurality of instructions defines a plurality of comparisons and a plurality of actions and wherein each comparison comprises an operand and wherein the memory is further configured to store tokenizing state information; an instruction pointer configured to identify one of the plurality of instructions;
a classifier configured to assign the at least one character to one of a plurality of character classes;a plurality of token logic units configured to operate, at least in part, concurrently, wherein each of the plurality of token logic units is configured to perform one of the plurality of comparisons of the identified instruction so as to produce an output and wherein at least one of the plurality of token logic units is configured to perform the respective one of the plurality of comparisons by comparing the one of the plurality of character classes to the operand of the respective one of the plurality of comparisons; and an execution unit configured to select an action from the plurality of actions in response to the output of one of the plurality of token logic units and wherein the execution unit is further configured to execute the selected action and wherein the execution unit is configured to store tokenizing state information to the memory based on the selected action. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. An integrated circuit comprising a computer readable medium having stored thereon a software defining a process that when being executed causes a logic associated therewith to perform the acts of:
-
receiving at least one character from a document; assigning the at least one character to at least one of a plurality of character classes; loading an instruction wherein the instruction comprises a plurality of comparisons and a plurality of actions and wherein each of the plurality of comparisons comprises at least one operand; concurrently performing at least some of the plurality of comparisons wherein performing said at least one of the plurality of comparisons comprises comparing the at least one of the plurality of character classes with the at least one operand of the instruction; selecting at least one of the plurality of actions to perform based on at least one result of the comparing; and executing the at least one of the plurality of actions; and storing tokenizing state information to a memory in response to executing the action. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A computer implemented system for tokenizing a document, the system comprising:
-
means for receiving at least a portion of a document, the portion comprising at least one character; means for assigning the at least one character to at least one of a plurality of character classes; means for identifying at least one instruction; means for concurrently comparing at least two of the plurality of character classes to an operand defined by the instruction; means for selecting at least one action from a plurality of actions defined by the instruction to perform in response to the means for comparing; and means for executing the at least one action, wherein the executing means is configured to store tokenizing state information to a memory based on executing the at least one action. - View Dependent Claims (18, 19, 20, 21)
-
Specification