×

Simplifying complex data stream problems involving feature extraction from noisy data

  • US 7,805,445 B2
  • Filed: 07/11/2008
  • Issued: 09/28/2010
  • Est. Priority Date: 07/10/2008
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for processing a data stream from noisy data received in a computer, the method comprising:

  • applying multiple operators to the data stream, at least one of the operators taking as input a plurality of sets of input parameters, each set of input parameters including an input data stream pointer pointing to a data stream source and information about how to process the data stream, and each operator producing as output at least one set of output parameters, each set of output parameters including an output data stream pointer pointing to a data stream source and information about how to process the data stream, wherein the information about how to process the data stream include a fixed number of characters of the data stream to be processed, a fixed number of characters to be processed iteratively until a condition is met and a fixed number of characters to be processed indefinitely,wherein each set of input parameters and each set of output parameters includes a fixed positive window size and a non-negative amount to slide the window from a last position, the window and slide parameters defining a next chunk of data to be processed from a respective data stream,wherein an operation by each of the multiple operators includes;

    retrieving the next chunk for each of set of input parameters;

    performing digital processing operations on a respective next chunk;

    producing sets of output parameters; and

    adding data to one or more internal data stores, each internal data store acting as a data stream source,wherein there is one original and one final operator, the original operator having a fixed set of input parameters with the pointer pointing to an original data stream source, and the final operator having only one set of output parameters and the output data stream pointer pointing to an internal data store to which the final operator adds data, the output data stream including a reporting of features of the data stream as annotation tokens that follow immediately after a string in which the annotation tokens annotate,wherein the internal data store of the final operator holds a stream of annotations corresponding to the original data stream,wherein the data stream is processed in at least one of a single threaded mode and a multi-threaded mode,wherein in the single threaded mode, each data stream is an object that responds to a READ request by immediately either returning a “

    data source empty”

    condition or removing from the object and returning to a requester a next token, wherein the next token can be at least one of a specified number of characters and a next specified fixed number of strings of characters terminated by a white space,wherein, in the single threaded mode, the read token performs at least one of returning immediately with a condition that a data source is empty, and removing one token from the data source and returning with the removed token;

    wherein in the multi-threaded mode, the data stream responds to the READ request by waiting until the data stream has a next token and then removing the next token from the data stream and returning the next token to the requester wherein the multiple operators in the multi-threaded mode make calls via signals and synchronize via special tokens independent of the data stream and via the READ request that waits indefinitely for the data stream,wherein, in the multithreaded mode, an EOS character, not present in the data stream, is added to the end of a string in the data stream and tokens are read until a character in the string is EOS (End Of String),wherein, in the multithreaded mode, if the token is present in a dictionary, an EOC (End Of Character) character, not present in the data stream, is added at an end of each character in the string, and characters are read while each character in the string is not EOC, and wherein the EOC is passed to a next character, and if the token is not present in the dictionary, the token is passed to a next operator without the EOC, wherein the EOC is continuously passed to a next token in response to the token being present in the dictionary.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×