Rank-reduced token representation for automatic speech recognition

US 10,593,346 B2
Filed: 03/15/2017
Issued: 03/17/2020
Est. Priority Date: 12/22/2016
Status: Active Grant

First Claim

Patent Images

1. An electronic device, comprising:

a display;

a microphone;

one or more processors; and

memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for;

receiving speech input via the microphone;

determining a sequence of candidate words corresponding to the speech input, the sequence of candidate words including a current word and one or more previous words;

determining, from a set of trained parameters, a vector representation of the current word, wherein a number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word, wherein a second vector representation of a previous word of the one or more previous words is determined from a second set of trained parameters, wherein one or more linguistic characteristics of the previous word is different from the one or more linguistic characteristics of the current word, wherein a number of parameters in the second set of trained parameters is different from the number of parameters in the set of trained parameters, and wherein a dimension of the second vector representation of the subsequent word is equal to a dimension of the vector representation of the current word;

determining, using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words; and

displaying, based on the determined probability, a text representation of the speech input on the display.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure generally relates to processing speech or text using rank-reduced token representation. In one example process, speech input is received. A sequence of candidate words corresponding to the speech input is determined. The sequence of candidate words includes a current word and one or more previous words. A vector representation of the current word is determined from a set of trained parameters. A number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word. Using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words is determined. A text representation of the speech input is displayed based on the determined probability.

Citations

41 Claims

1. An electronic device, comprising:
- a display;
  
  a microphone;
  
  one or more processors; and
  
  memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for;
  
  receiving speech input via the microphone;
  
  determining a sequence of candidate words corresponding to the speech input, the sequence of candidate words including a current word and one or more previous words;
  
  determining, from a set of trained parameters, a vector representation of the current word, wherein a number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word, wherein a second vector representation of a previous word of the one or more previous words is determined from a second set of trained parameters, wherein one or more linguistic characteristics of the previous word is different from the one or more linguistic characteristics of the current word, wherein a number of parameters in the second set of trained parameters is different from the number of parameters in the set of trained parameters, and wherein a dimension of the second vector representation of the subsequent word is equal to a dimension of the vector representation of the current word;
  
  determining, using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words; and
  
  displaying, based on the determined probability, a text representation of the speech input on the display.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The device of claim 1, wherein the one or more linguistic characteristics of the current word include a frequency of occurrence of the current word in a corpus of text, the corpus of text used to infer the set of trained parameters.
  - 3. The device of claim 1, wherein the one or more linguistic characteristics of the current word include a number of senses of the current word.
  - 4. The device of claim 1, wherein the one or more linguistic characteristics of the current word include a number of word classes to which the current word belongs.
  - 5. The device of claim 1, wherein the one or more linguistic characteristics of the current word include a contextual diversity of the current word.
  - 6. The device of claim 1, wherein the vector representation is a continuous vector-space word representation of the current word.
  - 7. The device of claim 1, wherein the vector representation has a predetermined dimension.
  - 8. The device of claim 1, wherein the vector representation encodes syntactic and semantic relationships between the current word and a plurality of words of a lexicon.
  - 9. The device of claim 1, wherein the set of trained parameters is embodied by a first matrix representation and a second matrix representation of the current word, wherein each parameter of the set of trained parameters is a respective element of the first and second matrix representations, and wherein the vector representation is determined from the first and second matrix representations.
  - 10. The device of claim 9, wherein the first and second matrix representations are each √
    - {square root over (d_rr)}-by-r_tdimension matrices, wherein d_rris a dimension of the vector representation, and r_tis a ranking factor determined based on the one or more linguistic characteristics of the current word.
  - 11. The device of claim 9, wherein the vector representation e_tis determined according to e_t=flatten(U_t−
    - V_t^T), wherein U_tis the first matrix representation, V_t^Tis the transpose of the second matrix representation, and flatten( ) denotes a flattening operation that flattens a two-dimensional matrix to a one-dimensional vector.
  - 12. The device of claim 1, wherein the set of trained parameters is derived by training a neural network language model, and wherein during the training of the neural network language model, each parameter of the set of trained parameters is updated in a back propagation step of the training.
  - 13. The device of claim 1, wherein the probability of the next word given the current word and the one or more previous words is determined using a neural network language model.
  - 14. The device of claim 13, wherein the neural network language model receives a token representation of the current word at an input layer of the neural network language model and outputs the probability of the next word given the current word and the one or more previous words at an output layer of the neural network language model.
  - 15. The device of claim 14, wherein the vector representation is determined at the input layer of the neural network language model.
  - 16. The device of claim 1, wherein the one or more programs further include instructions for:
    - retrieving the set of trained parameters from a data structure containing a plurality of sets of trained parameters, each set of trained parameters of the plurality of sets of trained parameters corresponds to a respective word in a lexicon, wherein one or more sets of trained parameters of the plurality of sets of trained parameters each have a number of parameters that is different from the number of parameters of the set of trained parameters of the current word.
  - 17. The device of claim 16, wherein the one or more programs further include instructions for:
    - determining, based on the number of parameters of the set of trained parameters of the current word, a location of the set of trained parameters of the current word in the data structure, wherein the set of trained parameters of the current word is retrieved in accordance with the determined location.
  - 18. The device of claim 1, wherein a dimension of the vector representation is different from the number of parameters in the set of trained parameters.
  - 19. The device of claim 1, wherein the probability of the next word given the current word and the one or more previous words is determined using a language model, and wherein a size of the language model is based on the number of parameters in the set of trained parameters.

20. A method for performing automatic speech recognition using rank-reduced token representation, the method comprising:
- at an electronic device having one or more processors and memory;
  
  receiving speech input;
  
  determining a sequence of candidate words corresponding to the speech input, the sequence of candidate words including a current word and one or more previous words;
  
  determining, from a set of trained parameters, a vector representation of the current word, wherein a number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word, wherein a second vector representation of a previous word of the one or more previous words is determined from a second set of trained parameters, wherein one or more linguistic characteristics of the previous word is different from the one or more linguistic characteristics of the current word, wherein a number of parameters in the second set of trained parameters is different from the number of parameters in the set of trained parameters, and wherein a dimension of the second vector representation of the subsequent word is equal to a dimension of the vector representation of the current word;
  
  determining, using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words; and
  
  displaying, based on the determined probability, a text representation of the speech input.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 21. The method of claim 20, wherein the one or more linguistic characteristics of the current word include a frequency of occurrence of the current word in a corpus of text, the corpus of text used to infer the set of trained parameters.
  - 22. The method of claim 20, wherein the one or more linguistic characteristics of the current word include a number of senses of the current word.
  - 23. The method of claim 20, wherein the one or more linguistic characteristics of the current word include a number of word classes to which the current word belongs.
  - 24. The method of claim 20, wherein the one or more linguistic characteristics of the current word include a contextual diversity of the current word.
  - 25. The method of claim 20, wherein the vector representation encodes syntactic and semantic relationships between the current word and a plurality of words of a lexicon.
  - 26. The method of claim 20, wherein the set of trained parameters is embodied by a first matrix representation and a second matrix representation of the current word, wherein each parameter of the set of trained parameters is a respective element of the first and second matrix representations, and wherein the vector representation is determined from the first and second matrix representations.
  - 27. The method of claim 26, wherein the first and second matrix representations are each √
    - {square root over (d_rr)}-by-r_tdimension matrices, wherein d_rris a dimension of the vector representation, and r_tis a ranking factor determined based on the one or more linguistic characteristics of the current word.
  - 28. The method of claim 26, wherein the vector representation e_tis determined according to e_t=flatten(U_t·
    - V_t^T), wherein U_tis the first matrix representation, V_t^Tis the transpose of the second matrix representation, and flatten( ) denotes a flattening operation that flattens a two-dimensional matrix to a one-dimensional vector.
  - 29. The method of claim 20, wherein the set of trained parameters is derived by training a neural network language model, and wherein during the training of the neural network language model, each parameter of the set of trained parameters is updated in a back propagation step of the training.
  - 30. The method of claim 20, further comprising:
    - retrieving the set of trained parameters from a data structure containing a plurality of sets of trained parameters, each set of trained parameters of the plurality of sets of trained parameters corresponds to a respective word in a lexicon, wherein one or more sets of trained parameters of the plurality of sets of trained parameters each have a number of parameters that is different from the number of parameters of the set of trained parameters of the current word.

31. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device with a display, the one or more programs including instructions for:
- receiving speech input;
  
  determining a sequence of candidate words corresponding to the speech input, the sequence of candidate words including a current word and one or more previous words;
  
  determining, from a set of trained parameters, a vector representation of the current word, wherein a number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word, wherein a second vector representation of a previous word of the one or more previous words is determined from a second set of trained parameters, wherein one or more linguistic characteristics of the previous word is different from the one or more linguistic characteristics of the current word, wherein a number of parameters in the second set of trained parameters is different from the number of parameters in the set of trained parameters, and wherein a dimension of the second vector representation of the subsequent word is equal to a dimension of the vector representation of the current word;
  
  determining, using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words; and
  
  displaying, based on the determined probability, a text representation of the speech input.
- View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
- - 32. The computer-readable storage medium of claim 31, wherein the one or more linguistic characteristics of the current word include a frequency of occurrence of the current word in a corpus of text, the corpus of text used to infer the set of trained parameters.
  - 33. The computer-readable storage medium of claim 31, wherein the one or more linguistic characteristics of the current word include a number of senses of the current word.
  - 34. The computer-readable storage medium of claim 31, wherein the one or more linguistic characteristics of the current word include a number of word classes to which the current word belongs.
  - 35. The computer-readable storage medium of claim 31, wherein the one or more linguistic characteristics of the current word include a contextual diversity of the current word.
  - 36. The computer-readable storage medium of claim 31, wherein the vector representation encodes syntactic and semantic relationships between the current word and a plurality of words of a lexicon.
  - 37. The computer-readable storage medium of claim 31, wherein the set of trained parameters is embodied by a first matrix representation and a second matrix representation of the current word, wherein each parameter of the set of trained parameters is a respective element of the first and second matrix representations, and wherein the vector representation is determined from the first and second matrix representations.
  - 38. The computer-readable storage medium of claim 37, wherein the first and second matrix representations are each √
    - {square root over (d_rr)}-by-r_tdimension matrices, wherein d_rris a dimension of the vector representation, and r_tis a ranking factor determined based on the one or more linguistic characteristics of the current word.
  - 39. The computer-readable storage medium of claim 37, wherein the vector representation e_tis determined according to e_t=flatten(U_t·
    - V_t^T), wherein U_tis the first matrix representation, V_t^Tis the transpose of the second matrix representation, and flatten( ) denotes a flattening operation that flattens a two-dimensional matrix to a one-dimensional vector.
  - 40. The computer-readable storage medium of claim 31, wherein the set of trained parameters is derived by training a neural network language model, and wherein during the training of the neural network language model, each parameter of the set of trained parameters is updated in a back propagation step of the training.
  - 41. The computer-readable storage medium of claim 31, wherein the one or more programs further include instructions for:
    - retrieving the set of trained parameters from a data structure containing a plurality of sets of trained parameters, each set of trained parameters of the plurality of sets of trained parameters corresponds to a respective word in a lexicon, wherein one or more sets of trained parameters of the plurality of sets of trained parameters each have a number of parameters that is different from the number of parameters of the set of trained parameters of the current word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Van Gysel, Christophe J., Su, Yi, Niu, Xiaochuan, Oparin, Ilya
Primary Examiner(s)
Kim, Jonathan C

Application Number

US15/459,481
Publication Number

US 20180182376A1
Time in Patent Office

1,098 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/16   using artificial neural net...

G10L 15/183   using context dependencies,...

G10L 21/10   Transforming into visible i...

Rank-reduced token representation for automatic speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

41 Claims

Specification

Solutions

Use Cases

Quick Links

Rank-reduced token representation for automatic speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

41 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links