WORD VECTOR PROCESSING FOR FOREIGN LANGUAGES
First Claim
Patent Images
1. A word vector processing method, comprising:
- performing word segmentation on a corpus to obtain words;
determining n-gram strokes corresponding to the words, the n-gram stroke representing n successive strokes of a corresponding word;
initializing word vectors of the words and stroke vectors of the n-gram strokes corresponding to the words; and
after performing the word segmentation, determining the n-gram strokes, and initializing the word vectors and stroke vectors, training the word vectors and the stroke vectors.
3 Assignments
0 Petitions
Accused Products
Abstract
A word vector processing method is provided. Word segmentation is performed on a corpus to obtain words, and n-gram strokes corresponding to the words are determined. Each n-gram stroke represents n successive strokes of a corresponding word. Word vectors of the words and stroke vectors of the n-gram strokes are initialized corresponding to the words. After performing the word segmentation, the n-gram strokes are determined, and the word vectors and stroke vectors are determined, training the word vectors and the stroke vectors.
2 Citations
20 Claims
-
1. A word vector processing method, comprising:
-
performing word segmentation on a corpus to obtain words; determining n-gram strokes corresponding to the words, the n-gram stroke representing n successive strokes of a corresponding word; initializing word vectors of the words and stroke vectors of the n-gram strokes corresponding to the words; and after performing the word segmentation, determining the n-gram strokes, and initializing the word vectors and stroke vectors, training the word vectors and the stroke vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform operations comprising:
-
performing word segmentation on a corpus to obtain words; determining n-gram strokes corresponding to the words, the n-gram stroke representing n successive strokes of a corresponding word; initializing word vectors of the words and stroke vectors of the n-gram strokes corresponding to the words; and after performing the word segmentation, determining the n-gram strokes, and initializing the word vectors and stroke vectors, training the word vectors and the stroke vectors. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A computer-implemented system, comprising:
-
one or more computers; and one or more computer memory devices interoperably coupled with the one or more computers and having tangible, non-transitory, machine-readable media storing one or more instructions that, when executed by the one or more computers, perform one or more operations comprising; performing word segmentation on a corpus to obtain words; determining n-gram strokes corresponding to the words, the n-gram stroke representing n successive strokes of a corresponding word; initializing word vectors of the words and stroke vectors of the n-gram strokes corresponding to the words; and after performing the word segmentation, determining the n-gram strokes, and initializing the word vectors and stroke vectors, training the word vectors and the stroke vectors. - View Dependent Claims (18, 19, 20)
-
Specification