×

Device for generating aligned corpus based on unsupervised-learning alignment, method thereof, device for analyzing destructive expression morpheme using aligned corpus, and method for analyzing morpheme thereof

  • US 10,282,413 B2
  • Filed: 08/27/2014
  • Issued: 05/07/2019
  • Est. Priority Date: 10/02/2013
  • Status: Active Grant
First Claim
Patent Images

1. A device for analyzing a morpheme in natural language information processing, comprising:

  • a knowledge database including an aligned corpus for storing a plurality of knowledge information sets used for a per-language morpheme analysis, and storing a morpheme dictionary for storing morpheme information corresponding to a normal expression and normal expression information corresponding to a destructive expression, wherein the destructive expression represents an expression that is erroneous in orthography or is not normalized and standardized, and includes an orthographic error; and

    an analyzer for, by a processor, performing a morpheme analysis on an input separate word by use of the knowledge database and outputting an analysis result, and when a morpheme on the input separate word is not provided in the morpheme dictionary, finding the normal expression corresponding to the destructive expression by use of the aligned corpus regarding the destructive expression included in the input separate word and performing a morpheme analysis,wherein the aligned corpus is generated by performing an unsupervised-learning-based alignment on a parallel corpus storing pairs of a destructive sentence including the destructive expression and a normal sentence corresponding to the destructive sentence, andwherein the parallel corpus is built by collecting a plurality of destructive sentences through a network, performing retrieval through the network with the destructive expression included in the collected destructive sentence as a query to determine universality on the corresponding destructive sentence, generating the normal sentence corresponding to the destructive sentence when the collected destructive sentence is determined to have universality, and forming the generated normal sentence and the corresponding destructive sentence into one pair to build the parallel corpus,wherein the knowledge database further includes an analyzed dictionary for storing per-morpheme access information, andthe analyzer includes;

    a morpheme divider for dividing the morphemes forming the input separate word by use of the morpheme dictionary, and when the morpheme forming the input separate word is not provided in the morpheme dictionary, performing a morpheme division by using the corresponding normal expression by use of the aligned corpus;

    an access information checker for extracting morphemes that are capable of being combined with the morphemes divided by the morpheme divider by use of the analyzed dictionary; and

    an original form restoring unit for performing an original form restoration on the morphemes extracted by the access information checker and outputting it as a morpheme analysis result.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×