×

Word vector processing for foreign languages

  • US 10,430,518 B2
  • Filed: 01/18/2018
  • Issued: 10/01/2019
  • Est. Priority Date: 01/22/2017
  • Status: Active Grant
First Claim
Patent Images

1. A word vector processing method, comprising:

  • performing word segmentation on a corpus to obtain words;

    determining n-gram strokes corresponding to the words, the n-gram stroke representing n successive strokes of a corresponding word;

    initializing word vectors of the words and stroke vectors of the n-gram strokes corresponding to the words; and

    after performing the word segmentation, determining the n-gram strokes, and initializing the word vectors and stroke vectors, training the word vectors and the stroke vectors,wherein the training the word vectors and the stroke vectors comprises;

    determining a designated word in the corpus, and one or more context words of the designated word in the corpus;

    determining a degree of similarity between the designated word and the context word according to stroke vectors of n-gram strokes corresponding to the designated word as well as a word vector of the context word;

    selecting one or more words from the words as a negative sample word;

    determining a degree of similarity between the designated word and each negative sample word;

    determining a loss characterization value corresponding to the designated word according to a designated loss function, the degree of similarity between the designated word and the context word, and the degree of similarity between the designated word and each negative sample word; and

    updating the word vector of the context word and the stroke vectors of the n-gram strokes corresponding to the designated word according to the loss characterization value.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×