Efficient method for information extraction

US 20020165717A1
Filed: 04/08/2002
Published: 11/07/2002
Est. Priority Date: 04/06/2001
Status: Abandoned Application

First Claim

Patent Images

1. A system for extracting information from text documents, comprising:

an input module for receiving a plurality of text documents for information extraction, wherein said plurality of documents may be formatted in accordance with any one of a plurality of formats;

an input conversion module for converting said plurality of text documents into a single format for processing;

a tokenizer module for generating and assigning tokens to symbols contained in said plurality of text documents;

an extraction module for receiving said tokens from said tokenizer module and extracting desired information from each of said plurality of text documents;

an output conversion module for converting said extracted information into a single output format; and

an output module for outputting said converted extracted information, wherein each of the above modules operate simultaneous and independently of one another so as to process said plurality of text documents in a pipeline fashion.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides a method and system for extracting information from text documents. A document intake module receives and stores a plurality of text documents for processing, an input format conversion module converts each document into a standard format for processing, an extraction module identifies and extracts desired information from each text document, and an output format conversion module converts the information extracted from each document into a standard output format. These modules operate simultaneously on multiple documents in a pipeline fashion so as to maximize the speed and efficiency of extracting information from the plurality of documents.

135 Citations

60 Claims

1. A system for extracting information from text documents, comprising:
- an input module for receiving a plurality of text documents for information extraction, wherein said plurality of documents may be formatted in accordance with any one of a plurality of formats;
  
  an input conversion module for converting said plurality of text documents into a single format for processing;
  
  a tokenizer module for generating and assigning tokens to symbols contained in said plurality of text documents;
  
  an extraction module for receiving said tokens from said tokenizer module and extracting desired information from each of said plurality of text documents;
  
  an output conversion module for converting said extracted information into a single output format; and
  
  an output module for outputting said converted extracted information, wherein each of the above modules operate simultaneous and independently of one another so as to process said plurality of text documents in a pipeline fashion.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48)
- - 2. The system of claim 1 wherein said extraction module finds a best path sequence of states in a HMM, wherein said HMM is trained using a plurality of training documents each having a sequence of tagged states, and wherein said information is extracted from said plurality of text documents based on a best path sequence of states provided by said HMM for each of said plurality of text documents.
  - 3. The system of claim 2 wherein said extraction module calculates a confidence score for information extracted from at least one of said plurality of text documents, wherein said confidence score is based on a measure of similarity between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
  - 4. The system of claim 3 wherein said measure of similarity is based in part on an edit distance between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
  - 5. The system of claim 3 wherein said HMM is a hierarchical HMM (HHMM) comprising at least one subsequence of states within at least one of said states in said best path sequence of states and said confidence score is calculated using values of edit distance between said best path sequence of states, including said at least one subsequence of states, and said at least one sequence of tagged states, wherein said edit distance value associated with said at least one subsequence of states is scaled by a specified cost factor.
  - 6. The system of claim 2 wherein said HMM comprises at least one merged state formed by V-merging, at least one merged stated formed by H-merging, and at least one merged sequence of states formed by ESS-merging.
  - 7. The system of claim 2 wherein said HMM states are modeled with non-exponential length distributions and said extraction module further dynamically changes probability length distributions of said HMM states during information extraction, wherein if a first state'"'"'s best transition was from itself, its self-transition probability is adjusted to (1−
    - cdf(t+1))/(1−
      
      cdf(t)) and all other outgoing transitions from said first state are scaled by (cdf(t+1)−
      
      cdf(t))/(1−
      
      cdf(t)), and if said first state is transitioned to by another state, its self-transition probability is reset to its original value of (1−
      
      cdf(1))/(1−
      
      cdf(0)), where cdf is the cumulative probability distribution function for said first state'"'"'s length distribution, and t is the number of symbols emitted by said first state in said best path.
  - 8. The system of claim 1 further comprising:
    - a process monitor for monitoring the processes of each of said modules recited in claim 1 and detecting if one or more of said modules ceases to function;
      
      a startup module for re-queuing data for reprocessing by one or more of said modules, in accordance with the status of said one or more modules prior to when it ceased functioning, and restarting said one or more modules to reprocess said re-queued data; and
      
      a data storage unit for storing data control files and said data.
  - 9. The system of claim 1 wherein said input module comprises:
    - an input data storage unit for storing said plurality of text documents and at least one control file associated with said plurality of text documents; and
      
      a file detection and validation module for processing said at least one control file so as to validate its control file structure and check for at least one referenced data file containing data from at least one of said plurality of text documents, wherein said file detection and validation module further copies said at least one data file to a second data storage unit, creates at least one processing control file and, thereafter, deletes said plurality of text documents and said at least one control file from said input data storage unit.
  - 10. The system of claim 9 wherein said input conversion module comprises a filter and converter module for detecting a file type for said at least one data file, initiating appropriate conversion routines for said at least one data file depending on said detected file type so as to convert said at least one data file into a standard format, and creating said at least one processing control file and at least one new data file, in accordance with said standard format, for further processing by said system.
  - 11. The system of claim 1 wherein said output conversion module comprises:
    - an output normalizer module for converting said extracted information to a XDR-compliant data format; and
      
      an output transform module for converting said XDR-compliant data to a desired end-user-compliant format.
  - 13. The method of claim 12 wherein said act of extracting comprises finding a best path sequence of states in a HMM, where said HMM is trained using a plurality of training documents each having a sequence of tagged states, and wherein said information is extracted from said plurality of text documents based on said best path sequence of states provided by said HMM for each of said plurality of text documents.
  - 14. The method of claim 13 wherein said act of extracting further comprises calculating a confidence score for information extracted from at least one of said plurality of text documents, wherein said confidence score is based on a measure of similarity between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
  - 15. The method of claim 14 wherein said measure of similarity is based in part on an edit distance between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
  - 16. The method of claim 14 wherein said HMM is a hierarchical HMM (HHMM) comprising at least one subsequence of states within at least one of said states in said best path sequence of states and said confidence score is calculated using values of edit distance between said best path sequence of states, including said at least one subsequence of states, and said at least one sequence of tagged states, wherein said edit distance value associated with said at least one subsequence of states is scaled by a specified cost factor.
  - 17. The method of claim 13 wherein said HMM comprises at least one merged state formed by V-merging, at least one merged stated formed by H-merging, and at least one merged sequence of states formed by ESS-merging.
  - 18. The method of claim 13 wherein said HMM states are modeled with non-exponential length distributions and said act of extracting further comprises dynamically changing probability length distributions for said HMM states during information extraction, wherein if a first state'"'"'s best transition was from itself, its self-transition probability is adjusted to (1−
    - cdf(t+1))/(1−
      
      cdf(t)) and all other outgoing transitions from said first state are scaled by (cdf(t+1)−
      
      cdf(t))/(1−
      
      cdf(t)), and if said first state is transitioned to by another state, its self-transition probability is reset to its original value of (1−
      
      cdf(1))/(1−
      
      cdf(0)), where cdf is the cumulative probability distribution function for said first state'"'"'s length distribution, and t is the number of symbols emitted by said first state in said best path.
  - 19. The method of claim 12 further comprising:
    - monitoring the performance of each of said acts recited in claim 12 and detecting if one or more of said acts ceases to perform prematurely;
      
      re-queuing data for reprocessing by one or more of said acts, in accordance with the status of said one or more acts prior to when it ceased performing its intended functions; and
      
      restarting said one or more acts to reprocess said re-queued data.
  - 20. The method of claim 12 wherein said act of receiving comprises:
    - storing said plurality of text documents and at least one control file associated with said plurality of text documents in an input data storage unit;
      
      processing said at least one control file so as to validate its control file structure and check for at least one referenced data file containing data from at least one of said plurality of text documents;
      
      copying said at least one data file to a second data storage unit;
      
      creating at least one processing control file; and
      
      thereafter, deleting said plurality of text documents and said at least one control file from said input data storage unit.
  - 21. The method of claim 20 wherein said act of converting said plurality of text documents comprises detecting a file type for said at least one data file, initiating appropriate conversion routines for said at least one data file depending on said detected file type so as to convert said at least one data file into a standard format, and creating said at least one processing control file and at least one new data file, in accordance with said standard format, for further processing.
  - 22. The method of claim 12 wherein said act of converting said extracted information comprises:
    - converting said extracted information to a XDR-compliant data format; and
      
      converting said XDR-compliant data to a desired end-user-compliant format.
  - 24. The system of claim 23 wherein said means for extracting comprises means for finding a best path sequence of states in a [MM, wherein said HMM is trained using a plurality of training documents each having a sequence of tagged states, and wherein said information is extracted from said plurality of text documents based on said best path sequence of states provided by said HMM for each of said plurality of text documents.
  - 25. The system of claim 24 wherein said means for extracting further comprises means for calculating a confidence score for information extracted from at least one of said plurality of text documents, wherein said confidence score is based on a measure of similarity between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
  - 26. The system of claim 25 wherein said measure of similarity is based in part on an edit distance between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
  - 27. The system of claim 25 wherein said HMM is a hierarchical HMM (HHMM) comprising at least one subsequence of states within at least one of said states in said best path sequence of states and said means for calculating a confidence score comprises means for calculating values of edit distance between said best path sequence of states, including said at least one subsequence of states, and said at least one sequence of tagged states, wherein said means for calculating edit distance values comprises means for scaling an edit distance value associated with said at least one subsequence of states by a specified cost factor.
  - 28. The system of claim 24 wherein said HMM comprises at least one merged state formed by V-merging, at least one merged stated formed by H-merging, and at least one merged sequence of states formed by ESS-merging.
  - 29. The system of claim 24 wherein said HMM states are modeled with non-exponential length distributions, and wherein said system further comprises means for dynamically adjusting a probability length distribution for each of said states during information extraction, wherein if a first state'"'"'s best transition was from itself, its self-transition probability is adjusted to (1−
    - cdf(t+1))/(1−
      
      cdf(t)) and all other outgoing transitions from said first state are scaled by (cdf(t+1)−
      
      cdf(t))/(1−
      
      cdf(t)), and if said first state is transitioned to by another state, its self-transition probability is reset to its original value of (1−
      
      cdf(1))/(1−
      
      cdf(0)), where cdf is the cumulative probability distribution function for said first state'"'"'s length distribution, and t is the number of symbols emitted by said first state in said best path.
  - 30. The system of claim 23 further comprising:
    - means for monitoring the performance of each of said means recited in claim 23 and detecting if one or more of said means recited in claim 23, ceases to operate prematurely;
      
      means for re-queuing data for reprocessing by one or more of said means recited in claim 23, in accordance with the status of said one or more means recited in claim 23 prior to when it ceased operating prematurely; and
      
      means for restarting said one or more means recited in claim 23 to reprocess said re-queued data.
  - 31. The system of claim 23 wherein said means for receiving comprises:
    - means for storing said plurality of text documents and at least one control file associated with said plurality of text documents in an input data storage unit;
      
      means for processing said at least one control file so as to validate its control file structure and check for at least one referenced data file containing data from at least one of said plurality of text documents;
      
      means for copying said at least one data file to a second data storage unit;
      
      means for creating at least one processing control file; and
      
      means for deleting said plurality of text documents and said at least one control file from said input data storage unit.
  - 32. The system of claim 31 wherein said means for converting said plurality of text documents comprises:
    - means for detecting a file type for said at least one data file;
      
      means for initiating an appropriate conversion routine for said at least one data file depending on said detected file type so as to convert said at least one data file into a standard format; and
      
      means for creating said at least one processing control file and at least one new data file, in accordance with said standard format, for further processing.
  - 33. The system of claim 23 wherein said means for converting said extracted information comprises:
    - means for converting said extracted information to a XDR-compliant data format; and
      
      means for converting said XDR compliant data to a desired end-user-compliant format.
  - 35. The computer-readable medium of claim 34 wherein said act of extracting comprises finding a best path sequence of states in a HMM, wherein said HMM is trained using a plurality of training documents each having a sequence of tagged states, and wherein said information is extracted from said plurality of text documents based on a best path sequence of states provided by said HMM for each of said plurality of text documents.
  - 36. The computer-readable medium of claim 35 wherein said act of extracting further comprises calculating a confidence score for information extracted from at least one of said plurality of text documents, wherein said confidence score is based on a measure of similarity between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
  - 37. The computer-readable medium of claim 36 wherein said measure of similarity is based in part on an edit distance between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
  - 38. The computer-readable medium of claim 36 wherein said HMM is a hierarchical HMM (HHMM) comprising at least one subsequence of states within at least one of said states in said best path sequence of states and said confidence score is calculated using values of edit distance between said best path sequence of states, including said at least one subsequence of states, and said at least one sequence of tagged states, wherein said edit distance value associated with said at least one subsequence of states is scaled by a specified cost factor.
  - 39. The computer-readable medium of claim 35 wherein said HMM comprises at least one merged state formed by V-merging, at least one merged stated formed by H-merging, and at least one merged sequence of states formed by ESS-merging.
  - 40. The computer-readable medium of claim 35 wherein said HMM states are modeled with non-exponential length distributions and said act of extracting further comprises dynamically changing probability length distributions of said HMM states during information extraction, wherein if a first state'"'"'s best transition was from itself, its self-transition probability is adjusted to (1−
    - cdf(t+1))/(1−
      
      cdf(t)) and all other outgoing transitions from said first state are scaled by (cdf(t+1)−
      
      cdf(t))/(1−
      
      cdf(t)), and if said first state is transitioned to by another state, its self-transition probability is reset to its original value of (1−
      
      cdf(1))/(1−
      
      cdf(0)), where cdf is the cumulative probability distribution function for said first state'"'"'s length distribution, and t is the number of symbols emitted by said first state in said best path.
  - 41. The computer-readable medium of claim 34 wherein said method further comprises:
    - monitoring the performance of each of said acts recited in claim 34 and detecting if one or more of said acts recited in claim 34, ceases to perform prematurely;
      
      re-queuing data for reprocessing by one or more of said acts, in accordance with the status of said one or more acts prior to when it ceased performing its intended functions; and
      
      restarting said one or more acts to reprocess said re-queued data.
  - 42. The computer-readable medium of claim 34 wherein said act of receiving comprises:
    - storing said plurality of text documents and at least one control file associated with said plurality of text documents in an input data storage unit;
      
      processing said at least one control file so as to validate its control file structure and check for at least one referenced data file containing data from at least one of said plurality of text documents;
      
      copying said at least one data file to a second data storage unit;
      
      creating at least one processing control file; and
      
      thereafter, deleting said plurality of text documents and said at least one control file from said input data storage unit.
  - 43. The computer-readable medium of claim 42 wherein said act of converting said plurality of text documents comprises detecting a file type for said at least one data file, initiating appropriate conversion routines for said at least one data file depending on said detected file type so as to convert said at least one data file into a standard format, and creating said at least one processing control file and at least one new data file, in accordance with said standard format, for further processing.
  - 44. The computer-readable medium of claim 34 wherein said act of converting said extracted information comprises:
    - converting said extracted information to a XDR-compliant data format; and
      
      converting said XDR-compliant data to a desired end-user-compliant format.
  - 46. The method of claim 45 wherein said measure of similarity is based in part on an edit distance between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
  - 47. The method of claim 45 wherein said HMM comprises at least one merged state formed by V-merging, at least one merged stated formed by H-merging, and at least one merged sequence of states formed by ESS-merging.
  - 48. The method of claim 45 wherein said HMM is a hierarchical HMM (HHMM) comprising at least one subsequence of states within at least one of said states in said best path sequence of states and said confidence score is calculated using values of edit distance between said best path sequence of states, including said at least one subsequence of states, and said at least one sequence of tagged states, wherein said edit distance value associated with said at least one subsequence of states is scaled by a specified cost factor.

12. A method of extracting information from a plurality of text documents, comprising the acts of:
- receiving a plurality of text documents for information extraction, wherein said plurality of documents may be formatted in accordance with any one of a plurality of formats;
  
  converting said plurality of text documents into a single format for processing;
  
  generating and assigning tokens to symbols contained in said plurality of text documents;
  
  extracting desired information from each of said plurality of text documents based in part on said token assignments;
  
  converting said extracted information into a single output format; and
  
  outputting the converted information, wherein each of the above acts are performed simultaneous and independently of one another so as to process said plurality of text documents in a pipeline fashion.

23. A system for extracting information from a plurality of text documents, comprising:
- means for receiving a plurality of text documents for information extraction, wherein said plurality of documents may be formatted in accordance with any one of a plurality of formats;
  
  means for converting said plurality of text documents into a single format for processing;
  
  means for generating and assigning tokens to symbols contained in said plurality of text documents;
  
  means for extracting desired information from each of said plurality of text documents based in part on said token assignments;
  
  means for converting said extracted information into a single output format; and
  
  means for outputting the converted information, wherein each of the above means operate simultaneous and independently of one another so as to process said plurality of text documents in a pipeline fashion.

34. A computer-readable medium having computer executable instructions for performing a method of extracting information from a plurality of text documents, the method comprising:
- receiving a plurality of text documents for information extraction, wherein said plurality of documents may be formatted in accordance with any one of a plurality of formats;
  
  converting said plurality of text documents into a single format for processing;
  
  generating and assigning tokens to symbols contained in said plurality of text documents;
  
  extracting desired information from each of said plurality of text documents based in part on said token assignments;
  
  converting said extracted information into a single output format; and
  
  outputting the converted information, wherein each of the above acts are performed simultaneous and independently of one another so as to process said plurality of text documents in a pipeline fashion.

45. A method of extracting information from a text document, comprising:
- finding a best path sequence of states in a HMM, wherein said HMM is trained using a plurality of training documents each having a sequence of tagged states;
  
  extracting information from said text document based on said best path sequence of states; and
  
  calculating a confidence score for said extracted information, wherein said confidence score is based on a measure of similarity between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.

49. A method of extracting information from a text document, comprising:
- finding a best path sequence of states in a HMM, wherein said IBMM is trained using a plurality of training documents each having a sequence of tagged states and said HMM states are modeled with non-exponential length distributions so as to allow their probability length distributions to be changed dynamically during information extraction; and
  
  extracting information from said text document based on said best path sequence of states, wherein if a first state'"'"'s best transition was from itself, its self-transition probability is adjusted to (1−
  
  cdf(t+1))/(1−
  
  cdf(t)) and all other outgoing transitions from said first state are scaled by (cdf(t+1)−
  
  cdf(t))/(1−
  
  cdf(t)), and if said first state is transitioned to by another state, its self-transition probability is reset to its original value of (1−
  
  cdf(1))/(1−
  
  cdf(0)), where cdf is the cumulative probability distribution function for said first state'"'"'s length distribution, and t is the number of symbols emitted by said first state in said best path.

50. A computer-readable medium having computer executable instructions for performing a method of extracting information from a text document, said method comprising:
- finding a best path sequence of states in a HMM, wherein said HMM is trained using a plurality of training documents each having a sequence of tagged states;
  
  extracting information from said text document based on said best path sequence of states; and
  
  calculating a confidence score for said extracted information, wherein said confidence score is based on a measure of similarity between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
- View Dependent Claims (51, 52, 53)
- - 51. The computer-readable medium of claim 50 wherein said measure of similarity is based in part on an edit distance between said best path sequence of states and at least one of said sequence of tagged states from at least one of said plurality of training documents.
  - 52. The computer-readable medium of claim 50 wherein said HMM comprises at least one merged state formed by V-merging, at least one merged stated formed by H-merging, and at least one merged sequence of states formed by ESS-merging.
  - 53. The computer-readable medium of claim 50 wherein said HMM is a hierarchical HMM (HHMM) comprising at least one subsequence of states within at least one of said states in said best path sequence of states and said confidence score is calculated using values of edit distance between said best path sequence of states, including said at least one subsequence of states, and said at least one sequence of tagged states, wherein said edit distance value associated with said at least one subsequence of states is scaled by a specified cost factor.

54. A computer-readable medium having computer executable instructions for performing a method of extracting information from a text document, said method comprising:
- finding a best path sequence of states in a HMM, wherein said HMM is trained using a plurality of training documents each having a sequence of tagged states and said HMM states are modeled with non-exponential length distributions so as to allow their probability length distributions to be changed dynamically during information extraction; and
  
  extracting information from said text document based on said best path sequence of states, wherein if a first HMM state'"'"'s best transition was from itself, its self-transition probability is adjusted to (1−
  
  cdf(t+1))/(1−
  
  cdf(t)) and all other outgoing transitions from said first HMM state are scaled by (cdf(t+1)−
  
  cdf(t))/(1−
  
  cdf(t)), and if said first HMM state is transitioned to by another state, its self-transition probability is reset to its original value of (1−
  
  cdf(1))/(1−
  
  cdf(0)), where cdf is the cumulative probability distribution function for said first state'"'"'s length distribution, and t is the number of symbols emitted by said first state in said best path.

55. A method of extracting information from a text document, comprising:
- creating a HMM using a plurality of training documents of a known type, wherein said training documents comprise tagged sequences of states;
  
  generalizing said HMM by merging repeating sequences of states;
  
  finding a best path through said HMM representative of said text document, wherein information is extracted from said text document based on said best path.

56. A method of extracting information from a text document, comprising:
- creating a HMM using a plurality of training documents of a known type, wherein said training documents comprise tagged sequences of states and said HMM comprises HMM states that are modeled with non-exponential length distributions so as to allow their probability length distributions to be changed dynamically during information extraction;
  
  finding a best path through said HMM representative of said text document, wherein information is extracted from said text document based on said best path, and wherein if a first HMM state'"'"'s best transition was from itself, its self-transition probability is adjusted to (1−
  
  cdf(t+1))/(1−
  
  cdf(t)) and all other outgoing transitions from said first HMM state are scaled by (cdf(t+1)−
  
  cdf(t))/(1−
  
  cdf(t)), and if said first HMM state is transitioned to by another state, its self-transition probability is reset to its original value of (1−
  
  cdf(1))/(1−
  
  cdf(0)), where cdf is the cumulative probability distribution function for said first state'"'"'s length distribution, and t is the number of symbols emitted by said first state in said best path.

57. A computer-readable medium having computer executable instructions for performing a method of extracting information from a text document, said method comprising:
- creating a HMM using a plurality of training documents of a known type, wherein said training documents comprise tagged sequences of states;
  
  generalizing said HMM by merging repeating sequences of states;
  
  finding a best path through said HMM representative of said text document, wherein information is extracted from said text document based on said best path.

58. A computer-readable medium having computer executable instructions for performing a method of extracting information from a text document, said method comprising:
- creating a HMM using a plurality of training documents of a known type, wherein said training documents comprise tagged sequences of states and said HMM comprises HMM states that are modeled with non-exponential length distributions so as to allow their probability length distributions to be changed dynamically during information extraction;
  
  finding a best path through said HMM representative of said text document, wherein information is extracted from said text document based on said best path, and wherein if a first HMM state'"'"'s best transition was from itself, its self-transition probability is adjusted to (1−
  
  cdf(t+1))/(1−
  
  cdf(t)) and all other outgoing transitions from said first HMM state are scaled by (cdf(t+1)−
  
  cdf(t))/(1−
  
  cdf(t)), and if said first HMM state is transitioned to by another state, its self-transition probability is reset to its original value of (1−
  
  cdf(1))/(1−
  
  cdf(0)), where cdf is the cumulative probability distribution function for said first state'"'"'s length distribution, and t is the number of symbols emitted by said first state in said best path.

59. A computer readable storage medium encoded with information comprising a HMM data structure including a plurality of states in which at least one sequence of states in said HMM data structure is created by merging a repeated sequence of states.

60. A computer readable storage medium encoded with information comprising a HMM data structure including a plurality of states in which at least one sequence of more than two states in said HMM data structure includes a transition from a last state in the at least one sequence to the first state in the sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kofax Image Products Incorporated (Kofax Ltd.)
Original Assignee
Kofax Image Products Incorporated (Kofax Ltd.)
Inventors
Schmidtler, Mauritius A.R., Solmer, Robert P., Dolter, James W., Harris, Christopher K.

Application Number

US10/118,968
Publication Number

US 20020165717A1
Time in Patent Office

Days
Field of Search
US Class Current

704/256
CPC Class Codes

G06F 40/289   Phrasal analysis, e.g. fini...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/197   Probabilistic grammars, e.g...

Efficient method for information extraction

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

135 Citations

60 Claims

Specification

Solutions

Use Cases

Quick Links

Efficient method for information extraction

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

135 Citations

60 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links