Part-of-speech tagging using latent analogy
First Claim
1. A method, comprising:
- analyzing a corpus having first training sequences of words in a semantic vector space;
extracting a global semantic information associated with an input sequence of words from the semantic vector space;
selecting second training sequences of words having part-of-speech tags in the semantic vector space based on the global semantic information and the first training sequences; and
assigning a part-of-speech tag to at least one word of the input sequence based on the part-of-speech tags of the second training sequences, wherein at least one of the analyzing, extracting, selecting, and assigning is performed by a processor.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatuses to assign part-of-speech tags to words are described. An input sequence of words is received. A global fabric of a corpus having training sequences of words may be analyzed in a vector space. A global semantic information associated with the input sequence of words may be extracted based on the analyzing. A part-of-speech tag may be assigned to a word of the input sequence based on POS tags from pertinent words in relevant training sequences identified using the global semantic information. The input sequence may be mapped into a vector space. A neighborhood associated with the input sequence may be formed in the vector space wherein the neighborhood represents one or more training sequences that are globally relevant to the input sequence.
681 Citations
26 Claims
-
1. A method, comprising:
-
analyzing a corpus having first training sequences of words in a semantic vector space;
extracting a global semantic information associated with an input sequence of words from the semantic vector space;selecting second training sequences of words having part-of-speech tags in the semantic vector space based on the global semantic information and the first training sequences; and assigning a part-of-speech tag to at least one word of the input sequence based on the part-of-speech tags of the second training sequences, wherein at least one of the analyzing, extracting, selecting, and assigning is performed by a processor. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method to assign part-of-speech tags to words, comprising:
-
receiving an input sequence of words; mapping the input sequence into a semantic vector space, wherein the semantic vector space includes representations of a first plurality of training sequences of words; and forming a neighborhood associated with the input sequence in the semantic vector space to obtain a part-of-speech tag for at least one word of the input sequence, wherein the neighborhood represents one or more second training sequences having part-of-speech tags selected from the first plurality of training sequences that are globally semantically relevant to the input sequence in the semantic vector space wherein at least one of the receiving, mapping, and forming is performed by a processor. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
-
13. An article of manufacture comprising:
-
a non-transitory machine-accessible medium including data that, when accessed by a machine, cause the machine to perform operations comprising, analyzing a corpus having first training sequences of words in a semantic vector space; extracting a global semantic information associated with an input sequence of words from the semantic vector space; selecting second training sequences of words having part-of-speech tags in the semantic vector space based on the global semantic information and the first training sequences; and assigning a part-of-speech tag to to at least one word of the input sequence based on the part-of-speech tags of the second training sequences. - View Dependent Claims (14, 15, 16, 17)
-
-
18. An article of manufacture comprising:
-
a non-transitory machine-accessible medium including data that, when accessed by a machine, cause the machine to perform operations to assign part-of-speech tags to words, comprising; receiving an input sequence of words; mapping the input sequence into a semantic vector space, wherein the semantic vector space includes representations of a first plurality of training sequences of words; and forming a neighborhood associated with the input sequence in the semantic vector space to obtain part-of-speech tag for at least one word of the input sequence, wherein the neighborhood represents one or more second training sequences having part-of-speech tags selected from the first plurality of training sequences that are globally semantically relevant to the input sequence in the semantic vector space. - View Dependent Claims (19, 20, 21, 22, 23, 24)
-
-
25. A data processing system, comprising:
-
means for analyzing a corpus having first having training sequences of words in a semantic vector space; means for extracting a global semantic information associated with an input sequence of words from the semantic vector space; means for identifying selecting second training sequences of words having part-of-speech tags in the semantic vector space based on the global semantic information and the first training sequences; and means for assigning a part-of-speech tag to a-at least one word of the input sequence based on the part-of-speech tags of the second training sequences.
-
-
26. A data processing system, comprising:
-
means for receiving an input sequence of words; means for mapping the input sequence into a semantic vector space, wherein the semantic vector space includes representations of a first plurality of training sequences of words; and means for forming a neighborhood associated with the input sequence in the semantic vector space to obtain a part-of-speech tag for at least one word of the input sequence, wherein the neighborhood represents one or more second training sequences having part-of-speech tags selected from the first plurality of training sequences that are globally semantically relevant to the input sequence in the semantic vector space.
-
Specification