×

Language modelling system and a fast parsing method

  • US 7,603,651 B2
  • Filed: 05/28/2003
  • Issued: 10/13/2009
  • Est. Priority Date: 11/21/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A language processing system taking as input a grammar model for a language, formulated as an RTN and, secondly, one or more partially overlapping sequences at a time, of concatenable terminals, or more specifically language elements, said set of input sequences together formulated as a sequence-chart of such elements, said system delivering as output a parse result for each incoming sequence-chart, obtained by applying a parse operation on said sequence-chart, characterised in that said system comprises:

  • memory access to store and consult a set of phonological, morphological, syntactical and/or semantic rules for a language, formulated as an RTN data-structure;

    memory access to store and consult a plurality of n-Swings, in which n represents an integer number being at least 2, organised in lists of finite length, each list labelled with an n-fold of said language elements, and each n-Swing being defined as a unique path segment that is a contiguous part of at least one partial parse pathway (passing over transitions and jumps for FSM calling and returning), thus a Link sequence complying with the RTN, and that comprises one terminal transition at the start of the segment and one terminal transition at the end of the segment, and includes n-2 terminal transitions in between;

    derivation means to derive said lists of n-Swings for a given n from the RTN, while limiting the length of the n-Swings to a predetermined number of transitions to prevent infinite lengths, by applying the following algorithms for deriving n-Swings in three phases,the first phase being the generation of all possible Links in every FSM of said RTN,the second phase being the generation of all possible Bi-Swings in the RTN, the Bi-Swings being limited by a fixed limit on the length of its sequence of Links and/or by a limit of the call level span;

    the optional third phase being the generation of all possible n-Swings for a predetermined integer n>

    2, which are calculated by induction in recursive incremental phases, starting by calculating 3-Swings, by either left or right attachment of matching Bi-Swings to (n−

    1)-Swings,parsing means provided to derive a parse result from said supplied token sequence-chart of concatenable language elements by using said lists of n-Swings, said parsing means being provided to build all possible parses, by building a list of every n-Swings that matches a full sequence of N language elements that is derivable from said input sequence-chart;

    an optional means to derive a parse result from said supplied token sequence-chart of concatenable language elements, by seeding the set of possible parses, each regarded as Link sequences, from the set of n-Swings of the n-Swing list associated with an n-fold of n subsequent language elements, as occurs at the start or the end of a sequence comprised in the sequence-chart, and by trying to extend each seed of a parse by repeatedly attaching matching subsequent n-Swings at the begin or the end of the open ended sides of its Link sequences, while taking the subsequent n-Swings from lists that are associated with a sequence of n language elements as they occur directly at the one, respectively the other side of those language elements in the supplied sequence as derived from the sequence chart, that have not yet been associated with a set of n-Swings, said parsing means being also provided to multiply the built-up Link sequences each time multiple possibilities exist for extension, such that every extension possibility is generated and every possibility is unique and for rejecting extensions that contain non-matching call correspondences at any call level and for rejecting extensions that contain below zero calling levels, and for rejecting extensions that would contain call level differences exceeding a predetermined threshold value and for rejecting Link sequences that can'"'"'t be extended, said parsing means being further provided to form a list of parse result contributions, being a list of only those Link sequences that span the longest length of the supplied sequence and for organising each parse result contribution as a tree structure, said parsing means provided to further add all parse trees for each possible element sequence possible according to the supplied input sequence-chart and finally summarising the entire parse result in one parse forest, containing all parse trees or no parse tree if no full parse was found, or in, equivalently, a parse tree chart.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×