Rank-reduced token representation for automatic speech recognition
First Claim
1. An electronic device, comprising:
- a display;
a microphone;
one or more processors; and
memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for;
receiving speech input via the microphone;
determining a sequence of candidate words corresponding to the speech input, the sequence of candidate words including a current word and one or more previous words;
determining, from a set of trained parameters, a vector representation of the current word, wherein a number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word, wherein a second vector representation of a previous word of the one or more previous words is determined from a second set of trained parameters, wherein one or more linguistic characteristics of the previous word is different from the one or more linguistic characteristics of the current word, wherein a number of parameters in the second set of trained parameters is different from the number of parameters in the set of trained parameters, and wherein a dimension of the second vector representation of the subsequent word is equal to a dimension of the vector representation of the current word;
determining, using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words; and
displaying, based on the determined probability, a text representation of the speech input on the display.
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure generally relates to processing speech or text using rank-reduced token representation. In one example process, speech input is received. A sequence of candidate words corresponding to the speech input is determined. The sequence of candidate words includes a current word and one or more previous words. A vector representation of the current word is determined from a set of trained parameters. A number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word. Using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words is determined. A text representation of the speech input is displayed based on the determined probability.
-
Citations
41 Claims
-
1. An electronic device, comprising:
-
a display; a microphone; one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for; receiving speech input via the microphone; determining a sequence of candidate words corresponding to the speech input, the sequence of candidate words including a current word and one or more previous words; determining, from a set of trained parameters, a vector representation of the current word, wherein a number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word, wherein a second vector representation of a previous word of the one or more previous words is determined from a second set of trained parameters, wherein one or more linguistic characteristics of the previous word is different from the one or more linguistic characteristics of the current word, wherein a number of parameters in the second set of trained parameters is different from the number of parameters in the set of trained parameters, and wherein a dimension of the second vector representation of the subsequent word is equal to a dimension of the vector representation of the current word; determining, using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words; and displaying, based on the determined probability, a text representation of the speech input on the display. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for performing automatic speech recognition using rank-reduced token representation, the method comprising:
at an electronic device having one or more processors and memory; receiving speech input; determining a sequence of candidate words corresponding to the speech input, the sequence of candidate words including a current word and one or more previous words; determining, from a set of trained parameters, a vector representation of the current word, wherein a number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word, wherein a second vector representation of a previous word of the one or more previous words is determined from a second set of trained parameters, wherein one or more linguistic characteristics of the previous word is different from the one or more linguistic characteristics of the current word, wherein a number of parameters in the second set of trained parameters is different from the number of parameters in the set of trained parameters, and wherein a dimension of the second vector representation of the subsequent word is equal to a dimension of the vector representation of the current word; determining, using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words; and displaying, based on the determined probability, a text representation of the speech input. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
31. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device with a display, the one or more programs including instructions for:
-
receiving speech input; determining a sequence of candidate words corresponding to the speech input, the sequence of candidate words including a current word and one or more previous words; determining, from a set of trained parameters, a vector representation of the current word, wherein a number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word, wherein a second vector representation of a previous word of the one or more previous words is determined from a second set of trained parameters, wherein one or more linguistic characteristics of the previous word is different from the one or more linguistic characteristics of the current word, wherein a number of parameters in the second set of trained parameters is different from the number of parameters in the set of trained parameters, and wherein a dimension of the second vector representation of the subsequent word is equal to a dimension of the vector representation of the current word; determining, using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words; and displaying, based on the determined probability, a text representation of the speech input. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
Specification