Method and apparatus for processing markup language information
First Claim
1. A method for processing an information stream of a syntactical based representation of information comprising:
- parsing the information stream according to a predetermined set of syntactical rules, the syntactical rules operable to indicate a hierarchical structure of the information;
extracting tokens from the information stream, the tokens corresponding to data items and having a particular type;
processing the tokens to generate an output representation of the data items included in the information stream, determining the output representation further comprising;
computing an initial configuration indicative of a particular output representation;
comparing tokens in the information stream to a set of predetermined rules, the predetermined rules indicative of a policy for selecting the output representation; and
dynamically applying the rules to augment the output representation according to the policy;
the output representation determinable according to the particular type of the data items and a set of policy rules, and operable to preserve the hierarchical structure for further processing by a recipient application, the policy rules dynamically responsive to the parsed information stream and operable to modify the output representation in response thereto, further comprising selectively determining the output representation based on the processing capabilities of the recipient application;
wherein the output representation further comprises an enumeration of the type of data item corresponding to a token, and an indication of the location of data item corresponding of the token; and
wherein the information stream further comprises a sequence of discontiguous portions, the discontiguous portions comprising nonconsecutive memory locations, the discontiguous portions apportioned according to an external protocol specifying transmission subdivisions of the discontiguous portions, further comprising concatenating the discontiguous portions into a continuous input stream.
2 Assignments
0 Petitions
Accused Products
Abstract
Information represented in text-based markup languages, such as XML, is often a large, highly nested structure corresponding to complex patterns of metadata and/or data. Parsing such data streams via conventional software mechanisms rapidly exhibits degrading performance as the size, or volume, of data increases. Further, such do not perform dynamic modification to the output in response to feedback based on the data being parsed. An adaptive XML processing hardware apparatus processes an XML document in a manner suited to the invoking application, and processes the incoming XML into an optimal structure based on the type of data and a set of rules relating the type of the data to the output format. It also dynamically augments the output information stream based on the data, at the option of the invoking system. The generated output may take a tree form, adaptable for efficient traversal of the hierarchical structure represented by the input XML, or may involve an attribute approach, in which the XML takes the from of a stream of fixed length cells containing optimized representations of input data, or may take a combination of the two approaches, based on configuration and XML input stream.
-
Citations
8 Claims
-
1. A method for processing an information stream of a syntactical based representation of information comprising:
-
parsing the information stream according to a predetermined set of syntactical rules, the syntactical rules operable to indicate a hierarchical structure of the information; extracting tokens from the information stream, the tokens corresponding to data items and having a particular type; processing the tokens to generate an output representation of the data items included in the information stream, determining the output representation further comprising; computing an initial configuration indicative of a particular output representation; comparing tokens in the information stream to a set of predetermined rules, the predetermined rules indicative of a policy for selecting the output representation; and dynamically applying the rules to augment the output representation according to the policy; the output representation determinable according to the particular type of the data items and a set of policy rules, and operable to preserve the hierarchical structure for further processing by a recipient application, the policy rules dynamically responsive to the parsed information stream and operable to modify the output representation in response thereto, further comprising selectively determining the output representation based on the processing capabilities of the recipient application; wherein the output representation further comprises an enumeration of the type of data item corresponding to a token, and an indication of the location of data item corresponding of the token; and wherein the information stream further comprises a sequence of discontiguous portions, the discontiguous portions comprising nonconsecutive memory locations, the discontiguous portions apportioned according to an external protocol specifying transmission subdivisions of the discontiguous portions, further comprising concatenating the discontiguous portions into a continuous input stream. - View Dependent Claims (2, 3)
identifying fragments of a token corresponding to a plurality of discontiguous portions; and marking the input stream with an indication of a forthcoming portion of data corresponding to the token.
-
-
3. The method of claim 2 further comprising:
-
receiving a plurality of information streams, each of the information streams represented by a sequence of discontiguous portions; identifying, for each discontiguous portion, a corresponding input stream; and identifying the discontiguous portions for each of the input streams, the identification independent of the order of receipt of the discontiguous portions.
-
-
4. A content sensitive document processor comprising:
-
a parser operable to parse the information stream according to a predetermined set of syntactical rules, the syntactical rules operable to indicate a hierarchical structure of the information, the parser further operable to extract tokens from the information stream, the tokens corresponding to data items and having a particular type; a generator receptive to the parser and operable to process the tokens to generate an output representation of the data items included in the information stream, the output representation determinable according to the particular type of the data items, and operable to preserve the hierarchical structure for further processing by a recipient application; wherein the output representation further comprises an enumeration of the type of data item corresponding to a token, and an indication of the location of data item corresponding of the token; and wherein the information stream further comprises a sequence of discontiguous portions, the discontiguous portions comprising nonconsecutive memory locations, the discontiguous portions apportioned according to an external protocol specifying transmission subdivisions of the discontiguous portions, the parser further operable to concatenate the discontiguous portions into a continuous input stream. - View Dependent Claims (5, 6)
identify fragments of a token corresponding to a plurality of discontiguous portions; and mark the input stream with an indication of a forthcoming portion of data corresponding to the token.
-
-
6. The document processor of claim 5 wherein the generator is further operable to:
-
receive a plurality of information streams, each of the information streams represented by a sequence of discontiguous portions; identify, for each discontiguous portion, a corresponding input stream; and reassemble the discontiguous portions for each of the input streams, the reassembling independent of the order of receipt of the discontiguous portions.
-
-
7. An encoded set of processor based instructions having program code embodied in a computer readable medium for processing an information stream of a syntactical based representation of information comprising:
-
program code for computing an initial configuration indicative of a particular output representation;
program code for comparing tokens in the information stream to a set of predetermined rules, the predetermined rules indicative of a policy for selecting the output representation;program code for dynamically applying the rules to augment the output representation according to the policy, including performing markup processing operations on the stream of encoded items, the rules further dynamically responsive to the markup processing operations, the markup processing operations operable to compute an output format determiner indicative of an optimal type of output, further comprising; program code for parsing the information stream according to a predetermined syntax, the syntax operable to indicate a hierarchical structure of the information; program code for extracting tokens from the information stream, the tokens corresponding to data items and having a particular type; and program code for processing the tokens to generate an output representation of the data items included in the information stream, the output representation determinable according to the particular type of the data items and a set of rules, and operable to preserve the hierarchical structure for further processing by a recipient application, the rules dynamically responsive to the parsed information stream and operable to modify the output representation in response thereto processing including switching between output formats, switching further comprising demarcating a transition in the parsed tokens between processed tokens corresponding to dissimilar output formats; program code for returning the parsed tokens and corresponding values to the invoking application in the determined optimal format, the values being attributes indicative of a discontiguous data value, the data value accessible via indirect address computation; wherein the output representation further comprises an enumeration of the type of data item corresponding to a token, and an indication of the location of data item corresponding of the token; and wherein the information stream further comprises a sequence of discontiguous portions, the discontiguous portions comprising nonconsecutive memory locations, the discontiguous portions apportioned according to an external protocol specifying transmission subdivisions of the discontiguous portions, further comprising program code for concatenating the discontiguous portions into a continuous input stream.
-
-
8. A content sensitive document processor comprising a memory for processing an information stream of a syntactical based representation of information comprising:
-
means for computing an initial configuration indicative of a particular output representation; means for comparing tokens in the information stream to a set of predetermined rules, the predetermined rules indicative of a policy for selecting the output representation; and means for dynamically applying the rules to augment the output representation according to the policy, including performing markup processing operations on the stream of encoded items, the rules further dynamically responsive to the markup processing operations, the markup processing operations operable to compute an output format determiner indicative of an optimal type of output, further comprising; means for parsing the information stream according to a predetermined syntax, the syntax operable to indicate a hierarchical structure of the information; means for extracting tokens from the information stream, the tokens corresponding to data items and having a particular type; and means for processing the tokens to generate an output representation of the data items included in the information stream, the output representation determinable according to the particular type of the data items and a set of rules, and operable to preserve the hierarchical structure for further processing by a recipient application, processing including switching between output formats, switching further comprising demarcating a transition in the parsed tokens between processed tokens corresponding to dissimilar output formats; means for returning the parsed tokens and corresponding values to the invoking application in the determined optimal format, the values being attributes indicative of a discontiguous data value, the data value accessible via indirect address computation; wherein the output representation further comprises an enumeration of the type of data item corresponding to a token, and an indication of the location of data item corresponding of the token; and wherein the information stream further comprises a sequence of discontiguous portions, the discontiguous portions comprising nonconsecutive memory locations, the discontiguous portions apportioned according to an external protocol specifying transmission subdivisions of the discontiguous portions, further comprising means for concatenating the discontiguous portions into a continuous input stream.
-
Specification