Method and apparatus for recognizing multiword expressions
First Claim
1. A method for identifying multiword expressions in an input string, comprising:
- morphologically analyzing words of the input string to replace words identified in the input string with their alternative base forms and parts of speech;
using the analyzed words of the input string to compile the input string into a first finite-state network;
matching the first finite-state network with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network;
each matching subpath of the first finite-state network and complete path of the second finite-state network identifying a multiword expression in the input string;
wherein the analyzing is performed without disambiguating words in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string; and
wherein said matching comprises;
i) generating a set of states comprising one state from the first finite-state network and one state from the second finite-state network;
ii) pushing at least the set of states onto a stack, in order to record start states of potentially matching subnetworks of the first and second finite-state networks; and
iii) recording start states of potentially matching subnetworks of the first and second finite-state networks.
3 Assignments
0 Petitions
Accused Products
Abstract
Words of an input string are morphologically analyzed to identify their alternative base forms and parts of speech. The analyzed words of the input string are used to compile the input string into a first finite-state network. The first finite-state network is matched with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network. Each matching subpath of the first finite-state network and path of the second finite-state network identify a multiword expression in the input string. The morphological analysis is performed without disambiguating words and without segmenting the input string into sentences in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string.
40 Citations
20 Claims
-
1. A method for identifying multiword expressions in an input string, comprising:
-
morphologically analyzing words of the input string to replace words identified in the input string with their alternative base forms and parts of speech; using the analyzed words of the input string to compile the input string into a first finite-state network; matching the first finite-state network with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network;
each matching subpath of the first finite-state network and complete path of the second finite-state network identifying a multiword expression in the input string;wherein the analyzing is performed without disambiguating words in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string; and wherein said matching comprises;
i) generating a set of states comprising one state from the first finite-state network and one state from the second finite-state network;
ii) pushing at least the set of states onto a stack, in order to record start states of potentially matching subnetworks of the first and second finite-state networks; and
iii) recording start states of potentially matching subnetworks of the first and second finite-state networks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for identifying multiword expressions in an input string, comprising:
-
a morphological analyzer for morphologically analyzing words of the input string to replace words identified in the input string with their alternative base forms and parts of speech; a finite-state compiler for using the analyzed words of the input string to compile the input string into a first finite-state network; an expression identifier for matching the first finite-state network with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network;
each matching subpath of the first finite-state network and path of the second finite-state network identifying a multiword expression in the input string;wherein said morphological analyzers performs morphological analysis without disambiguating words in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string and wherein said matching comprises;
i) generating a set of states comprising one state from the first finite-state network and one state from the second finite-state network;
ii) pushing at least the set of states onto a stack, in order to record start states of potentially matching subnetworks of the first and second finite-state networks; and
iii) recording start states of potentially matching subnetworks of the first and second finite-state networks. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An article of manufacture for use in a machine comprising:
-
a) a memory; b) instructions stored in the memory for identifying multiword expressions in an input string, the instructions adapted to perform a method comprising; morphologically analyzing words of the input string to replace words identified in the input string with their alternative base forms and parts of speech; using the analyzed words of the input string to compile the input string into a first finite-state network; matching the first finite-state network with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network;
each matching subpath of the first finite-state network and path of the second finite-state network identifying a multiword expression in the input string;wherein said morphological analysis is performed without disambiguating words in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string and wherein said matching comprises;
i) generating a set of states comprising one state from the first finite-state network and one state from the second finite-state network;
ii) pushing at least the set of states onto a stack, in order to record start states of potentially matching subnetworks of the first and second finite-state networks; and
iii) recording start states of potentially matching subnetworks of the first and second finite-state networks.
-
Specification