×

Computing numeric representations of words in a high-dimensional space

  • US 9,740,680 B1
  • Filed: 05/18/2015
  • Issued: 08/22/2017
  • Est. Priority Date: 01/15/2013
  • Status: Active Grant
First Claim
Patent Images

1. One or more non-transitory computer storage media encoded with a data set, the data set associating each word in a vocabulary of words with a respective numeric representation of the word in a high-dimensional space,wherein the data set indicates, for each word of a plurality of the words in the vocabulary and by the position of the numeric representation of the word in the high-dimensional space, a semantic meaning of the word,wherein the data set indicates, for each of a plurality of pairs of words in the vocabulary and by the relative positions of the numeric representations of the words in the high-dimensional space, a degree of semantic relationship, syntactic relationship, or both between the words in the pair of words,whereby the non-transitory computer storage media, when encoded with the data set, provides the function of representing in a quantitative way semantic and syntactic relationships between and among words in the vocabulary, andwherein the one or more non-transitory computer storage media are encoded with the data set by a process comprising the steps of:

  • obtaining a set of training data, wherein the set of training data comprises sequences of words;

    training a plurality of classifiers and an embedding function on the set of training data, wherein the embedding function receives an input word and maps the input word to a numeric representation in the high-dimensional space in accordance with a set of embedding function parameters, wherein each of the classifiers corresponds to a respective position surrounding the input word in a sequence of words, and wherein each of the classifiers processes the numeric representation of the input word to generate a respective word score for each word in a pre-determined set of words, wherein each of the respective word scores represents a predicted likelihood that the corresponding word will be found in the corresponding position relative to the input word, and wherein training the embedding function comprises determining trained values of the embedding function parameters;

    processing each word in the vocabulary using the embedding function in accordance with the trained values of the embedding function parameters to generate a respective numeric representation of each word in the vocabulary;

    generating the data set by associating each word in the vocabulary with the respective numeric representation of the word in the high-dimensional space; and

    storing the data set on the one or more non-transitory computer storage media.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×