System and method of data cleansing using rule based formatting
First Claim
1. A computer-implemented method for data cleansing using rule based formatting, comprising:
- obtaining a first input data from a first data source and a second input data from a second data source, wherein said first input data is tokenized according to a data dictionary, wherein said second input data is tokenized according to said data dictionary;
parsing, by a rule-based parsing module implemented by a hardware server, said first input data and said second input data using a predefined parsing rule including an option operator, wherein the option operator indicates that a particular index defined in the predefined parsing rule is optional;
obtaining a formatting rule, wherein said formatting rule includes one or more formatting rule components including at least one conditional format operator, wherein the at least one conditional format operator indicates whether to include a particular string literal in an output data based on the existence of a particular token;
including a first token in a first output data if a first formatting rule component in the formatting rule is a first valid index to said first tokenized input data, wherein said first token is associated with said first valid index, and including a first string literal in said first output data if said first formatting rule component in the formatting rule is a string literal;
including a second token in a second output data if said first formatting rule component in the formatting rule is a second valid index to said second tokenized input data, wherein said second token is associated with said second valid index and including a second string literal in said second output data if said first formatting rule component in the formatting rule is the string literal;
formatting, by a formatting module implemented by the hardware server, said first output data and said second output data according to the formatting rule; and
outputting said first output data and said second output data having been formatted.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment the present invention includes a computer-implemented method for data cleansing using rule based formatting. The method includes tokenizing and parsing a first input data and a second input data. The method further includes including a first token in a first output data if a first formatting rule component in a formatting rule is a first valid index to said first tokenized input data. The method further includes including a second token in a second output data if said first formatting rule component in the formatting rule is a second valid index to said second tokenized input data. The method further includes formatting said first output data and said second output data according to the formatting rule.
-
Citations
19 Claims
-
1. A computer-implemented method for data cleansing using rule based formatting, comprising:
-
obtaining a first input data from a first data source and a second input data from a second data source, wherein said first input data is tokenized according to a data dictionary, wherein said second input data is tokenized according to said data dictionary; parsing, by a rule-based parsing module implemented by a hardware server, said first input data and said second input data using a predefined parsing rule including an option operator, wherein the option operator indicates that a particular index defined in the predefined parsing rule is optional; obtaining a formatting rule, wherein said formatting rule includes one or more formatting rule components including at least one conditional format operator, wherein the at least one conditional format operator indicates whether to include a particular string literal in an output data based on the existence of a particular token; including a first token in a first output data if a first formatting rule component in the formatting rule is a first valid index to said first tokenized input data, wherein said first token is associated with said first valid index, and including a first string literal in said first output data if said first formatting rule component in the formatting rule is a string literal; including a second token in a second output data if said first formatting rule component in the formatting rule is a second valid index to said second tokenized input data, wherein said second token is associated with said second valid index and including a second string literal in said second output data if said first formatting rule component in the formatting rule is the string literal; formatting, by a formatting module implemented by the hardware server, said first output data and said second output data according to the formatting rule; and outputting said first output data and said second output data having been formatted. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable medium containing instructions for controlling a computer system to perform a method for data cleansing using rule based formatting, the method comprising:
-
obtaining a first input data and a second input data, wherein said first input data is tokenized according to a data dictionary, wherein said second input data is tokenized according to the data dictionary; parsing said first input data and said second input data using a predefined parsing rule including an option operator, wherein the option operator indicates that a particular index defined in the predefined parsing rule is optional; obtaining a formatting rule, wherein said formatting rule includes one or more formatting rule components including at least one conditional format operator, wherein the at least one conditional format operator indicates whether to include a particular string literal in an output data based on the existence of a particular token; including a first token in a first output data if a first formatting rule component in the formatting rule is a first valid index to said first tokenized input data, wherein said first token is associated with said first valid index, and including a first string literal in said first output data if said first formatting rule component in the formatting rule is a string literal; including a second token in a second output data if said first formatting rule component in the formatting rule is a second valid index to said second tokenized input data, wherein said second token is associated with said second valid index and including a second string literal in said second output data if said first formatting rule component in the formatting rule is the string literal; and formatting said first output data and said second output data according to the formatting rule. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for data cleansing using rule based formatting, comprising:
-
a hardware server that implements the system; a tokenizing module, implemented by the hardware server, that tokenizes a first input data according to a data dictionary, and tokenizes a second input data according to said data dictionary; a rule-based parsing module, implemented by the hardware server, that parses said first input data and said second tokenized input data using a predefined parsing rule including an option operator, wherein the option operator indicates that a particular index defined in the predefined parsing rule is optional; a formatting module, implemented by the hardware server, that receives the said first tokenized input data and said second tokenized input data, wherein a first token is included in a first output data if a first formatting rule component in a formatting rule is a first valid index to said first tokenized input data, wherein said first token is associated with said first valid index, wherein a first string literal is included in said first output data if said first formatting rule component in the formatting rule is a string literal, and wherein said formatting rule includes an immediate at least one conditional format operator, wherein the at least one conditional format operator indicates whether to include a particular string literal in an output data based on the existence of a particular token, wherein a second token is included in a second output data if said first formatting rule component in the formatting rule is a second valid index to said second tokenized input data, wherein said second token is associated with said second valid index, wherein a second string literal is included in said second output data if said first formatting rule component in the formatting rule is a string literal, and wherein said first output data and said second output data are formatted according to the formatting rule; a first data source that stores said first input data; a second data source that stores said second input data; and a third data source that stores said first output data and said second output data. - View Dependent Claims (16, 17, 18)
-
-
19. A computer-implemented method for data cleansing using rule based formatting, comprising:
-
obtaining a first input data from a first data source and a second input data from a second data source, wherein said first input data is tokenized according to a data dictionary, wherein said second input data is tokenized according to said data dictionary; parsing, by a rule-based parsing module implemented by a hardware server, said first input data and said second input data using a predefined parsing rule; obtaining a formatting rule, wherein said formatting rule includes one or more formatting rule components; including a first token in a first output data if a first formatting rule component in the formatting rule is a first valid index to said first tokenized input data, wherein said first token is associated with said first valid index, and including a first string literal in said first output data if said first formatting rule component in the formatting rule is a string literal; including a second token in a second output data if said first formatting rule component in the formatting rule is a second valid index to said second tokenized input data, wherein said second token is associated with said second valid index and including a second string literal in said second output data if said first formatting rule component in the formatting rule is the string literal; formatting, by a formatting module implemented by the hardware server, said first output data and said second output data according to the formatting rule; and outputting said first output data and said second output data having been formatted, wherein said first string literal is included in said first output data if a first token associated with a second formatting rule component to the immediate left of said first formatting rule component and a second token associated with a third formatting rule component to the immediate right of said first formatting rule component both exist, and wherein said second formatting rule component corresponds to a left-facing conditional format operator, and wherein said third formatting rule component corresponds to a right-facing conditional format operator.
-
Specification