METHOD AND APPARATUS FOR RECOGNIZING MULTIWORD EXPRESSIONS
First Claim
1. A method for identifying multiword expressions in an input string, comprising:
- morphologically analyzing words of the input string to identify their alternative base forms and parts of speech;
using the analyzed words of the input string to compile the input string into a first finite-state network;
matching the first finite-state network with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network;
each matching subpath of the first finite-state network and path of the second finite-state network identifying a multiword expression in the input string;
wherein said morphological analysis is performed without disambiguating words in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string.
3 Assignments
0 Petitions
Accused Products
Abstract
Words of an input string are morphologically analyzed to identify their alternative base forms and parts of speech. The analyzed words of the input string are used to compile the input string into a first finite-state network. The first finite-state network is matched with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network. Each matching subpath of the first finite-state network and path of the second finite-state network identify a multiword expression in the input string. The morphological analysis is performed without disambiguating words and without segmenting the input string into sentences in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string.
-
Citations
20 Claims
-
1. A method for identifying multiword expressions in an input string, comprising:
-
morphologically analyzing words of the input string to identify their alternative base forms and parts of speech;
using the analyzed words of the input string to compile the input string into a first finite-state network;
matching the first finite-state network with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network;
each matching subpath of the first finite-state network and path of the second finite-state network identifying a multiword expression in the input string;
wherein said morphological analysis is performed without disambiguating words in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for identifying multiword expressions in an input string, comprising:
-
a morphological analyzer for morphologically analyzing words of the input string to identify their alternative base forms and parts of speech;
a finite-state compiler for using the analyzed words of the input string to compile the input string into a first finite-state network;
an expression identifier for matching the first finite-state network with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network;
each matching subpath of the first finite-state network and path of the second finite-state network identifying a multiword expression in the input string;
wherein said morphological analyzers performs morphological analysis without disambiguating words in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An article of manufacture for use in a machine comprising:
-
a) a memory;
b) instructions stored in the memory for identifying multiword expressions in an input string, the instructions adapted to perform a method comprising;
morphologically analyzing words of the input string to identify their alternative base forms and parts of speech;
using the analyzed words of the input string to compile the input string into a first finite-state network;
matching the first finite-state network with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network;
each matching subpath of the first finite-state network and path of the second finite-state network identifying a multiword expression in the input string;
wherein said morphological analysis is performed without disambiguating words in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string.
-
Specification