Method and apparatus for processing markup language information
First Claim
1. A method for processing a markup document comprising:
- scanning an input stream indicative of the markup document to identify parseable tokens having boundaries;
checking the parseable tokens in the input stream to be well-formed by verifying conformance to a predetermined set of syntax rules, formatting a stream of encoded items corresponding to the input stream, the encoded items indicative of at least one parseable token;
computing, based on a set of rules, an output format, the set of rules dynamically responsive to the stream of encoded items, the computing resulting in an output format determiner indicative of an optimal type of output;
receiving the encoded item stream by at least one generator operable to generate an output stream, the output stream having an output format, the generator corresponding to the computed output format; and
producing an output structure according to the computed output format from the output stream, the output structure indicative of the structure and content of the markup document.
2 Assignments
0 Petitions
Accused Products
Abstract
Information represented in text-based markup languages, such as XML, is often a large, highly nested structure corresponding to complex patterns of metadata and/or data. Parsing such data streams via conventional software mechanisms rapidly exhibits degrading performance as the size, or volume, of data increases. Further, such do not perform dynamic modification to the output in response to feedback based on the data being parsed. An adaptive XML processing hardware apparatus processes an XML document in a manner suited to the invoking application, and processes the incoming XML into an optimal structure based on the type of data and a set of rules relating the type of the data to the output format. It also dynamically augments the output information stream based on the data, at the option of the invoking system. The generated output may take a tree form, adaptable for efficient traversal of the hierarchical structure represented by the input XML, or may involve an attribute approach, in which the XML takes the from of a stream of fixed length cells containing optimized representations of input data, or may take a combination of the two approaches, based on configuration and XML input stream.
80 Citations
53 Claims
-
1. A method for processing a markup document comprising:
-
scanning an input stream indicative of the markup document to identify parseable tokens having boundaries;
checking the parseable tokens in the input stream to be well-formed by verifying conformance to a predetermined set of syntax rules, formatting a stream of encoded items corresponding to the input stream, the encoded items indicative of at least one parseable token;
computing, based on a set of rules, an output format, the set of rules dynamically responsive to the stream of encoded items, the computing resulting in an output format determiner indicative of an optimal type of output;
receiving the encoded item stream by at least one generator operable to generate an output stream, the output stream having an output format, the generator corresponding to the computed output format; and
producing an output structure according to the computed output format from the output stream, the output structure indicative of the structure and content of the markup document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A markup device for markup document processing comprising:
-
at least one character processor operable to process an incoming markup data stream, the character processor further operable to;
scan the incoming markup data stream to determine boundaries of markup elements, attributes and tokens;
check the incoming markup data stream to verify well-formed markup according to a syntax of the markup; and
produce a stream of encoded items corresponding to the incoming markup data stream; and
at least one output generator responsive to the character processor, the output generator operable to;
receive the encoded items from the character processor;
perform markup processing operations on the encoded items; and
generate an output stream having an output format, the output stream indicative of the structure and content of the input data stream and the output format dynamically selectable depending on the received encoded items. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method for processing an information stream of a syntactical based representation of information comprising:
-
parsing the information stream according to a predetermined set of syntactical rules, the syntactical rules operable to indicate a hierarchical structure of the information;
extracting tokens from the information stream, the tokens corresponding to data items and having a particular type; and
processing the tokens to generate an output representation of the data items included in the information stream, the output representation determinable according to the particular type of the data items and a set of policy rules, and operable to preserve the hierarchical structure for further processing by a recipient application, the policy rules dynamically responsive to the parsed information stream and operable to modify the output representation in response thereto. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A method for processing a document comprising:
-
parsing tokens via an input stream from the document, the document arranged according to a predetermined format;
determining an output format indicative of a particular representation, the particular representation depending on an invoking application and a corresponding manner of processing;
identifying relations between the parsed tokens, the relations derived from the predetermined format;
processing each of the parsed tokens according to the determined output format, the processing transforming the tokens into the output format;
maintaining the relations and content corresponding to the parsed tokens during the processing; and
returning the parsed tokens and corresponding values to the invoking application in the determined optimal format. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A content sensitive document processor comprising:
-
a parser operable to parse the information stream according to a predetermined set of syntactical rules, the syntactical rules operable to indicate a hierarchical structure of the information, the parser further operable to extract tokens from the information stream, the tokens corresponding to data items and having a particular type; and
a generator receptive to the parser and operable to process the tokens to generate an output representation of the data items included in the information stream, the output representation determinable according to the particular type of the data items, and operable to preserve the hierarchical structure for further processing by a recipient application. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50)
-
-
51. A computer program product having a computer readable medium operable to store computer program logic embodied in computer program code encoded thereon for processing an information stream of a syntactical based representation of information comprising:
-
computer program code for parsing the information stream according to a predetermined syntax, the syntax operable to indicate a hierarchical structure of the information;
computer program code for extracting tokens from the information stream, the tokens corresponding to data items and having a particular type; and
computer program code for processing the tokens to generate an output representation of the data items included in the information stream, the output representation determinable according to the particular type of the data items and a set of rules, and operable to preserve the hierarchical structure for further processing by a recipient application the rules dynamically responsive to the parsed information stream and operable to modify the output representation in response thereto.
-
-
52. A computer data signal having program code for processing an information stream of a syntactical based representation of information comprising:
-
program code for parsing the information stream according to a predetermined syntax, the syntax operable to indicate a hierarchical structure of the information;
program code for extracting tokens from the information stream, the tokens corresponding to data items and having a particular type; and
program code for processing the tokens to generate an output representation of the data items included in the information stream, the output representation determinable according to the particular type of the data items and a set of rules, and operable to preserve the hierarchical structure for further processing by a recipient application, the rules dynamically responsive to the parsed information stream and operable to modify the output representation in response thereto.
-
-
53. A content sensitive document processor for processing an information stream of a syntactical based representation of information comprising:
-
means for parsing the information stream according to a predetermined syntax, the syntax operable to indicate a hierarchical structure of the information;
means for extracting tokens from the information stream, the tokens corresponding to data items and having a particular type; and
means for processing the tokens to generate an output representation of the data items included in the information stream, the output representation determinable according to the particular type of the data items and a set of rules, and operable to preserve the hierarchical structure for further processing by a recipient application.
-
Specification