Handwriting and speech recognizer using neural network with separate start and continuation output scores
First Claim
1. A method for recognizing user input data, comprising:
- segmenting the user input data into a plurality of segments;
for each segment, inputting segment information corresponding to the user input data into a neural network, the neural network outputting, for each character of a plurality of possible characters, a first score corresponding to a probability that the segment information represents the start of the character and a second score corresponding to a probability that the segment information represents the continuation of the character from a previous segment;
generating an output matrix from first and second sets of scores for the plurality of segments; and
using the first and second sets of scores in the matrix for the segments to determine paths therethrough corresponding to words and scores for the words, and returning at least one word based on a determined score thereof.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for recognizing user input information including cursive handwriting and spoken words. A time-delayed neural network having an improved architecture is trained at the word level with an improved method, which, along with preprocessing improvements, results in a recognizer with greater recognition accuracy. Preprocessing is performed on the input data and, for example, may include resampling the data with sample points based on the second derivative to focus the recognizer on areas of the input data where the slope change per time is greatest. The input data is segmented, featurized and fed to the time-delayed neural network which outputs a matrix of character scores per segment. The neural network architecture outputs a separate score for the start and the continuation of a character. A dynamic time warp (DTW) is run against dictionary words to find the most probable path through the output matrix for that word, and each word is assigned a score based on the least costly path that can be traversed through the output matrix. The word (or words) with the overall lowest score (or scores) are returned. A DTW is similarly used in training, whereby the sample ink only need be labeled at the word level.
101 Citations
36 Claims
-
1. A method for recognizing user input data, comprising:
-
segmenting the user input data into a plurality of segments;
for each segment, inputting segment information corresponding to the user input data into a neural network, the neural network outputting, for each character of a plurality of possible characters, a first score corresponding to a probability that the segment information represents the start of the character and a second score corresponding to a probability that the segment information represents the continuation of the character from a previous segment;
generating an output matrix from first and second sets of scores for the plurality of segments; and
using the first and second sets of scores in the matrix for the segments to determine paths therethrough corresponding to words and scores for the words, and returning at least one word based on a determined score thereof. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
- 24. A method for training a neural network to convert user input to strings of characters, comprising, setting the neural network with initially random weights, receiving ink for at least one sample word, each sample word having a word label associated therewith, featurizing the ink into a number of features values, inputting the feature values into the neural network, computing a dynamic time warp matrix for each word based on the characters in its word label, finding an optimal path back through the dynamic time warp matrix, and setting the neural network output to a first value for each output that corresponds to the path and to a second value for each output that does not correspond to the path, the neural network output including character start and character continuation values.
-
26. A system for recognizing user input data, comprising:
-
a neural network configured to receive information corresponding to segments of user input data and to output scores for a plurality of possible characters for each segment, including for each possible character a first score corresponding to a probability that the information of that segment represents the start of the character and a second score corresponding to a probability that the information of that segment represents the continuation of the character from a previous segment;
an output matrix of the scores for the segments; and
a mechanism that uses the output matrix to determine paths therethrough corresponding to words and scores for the words, and returns at least one word based on the determined score thereof. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
Specification