Recurrent conditional random fields

US 9,239,828 B2
Filed: 03/07/2014
Issued: 01/19/2016
Est. Priority Date: 12/05/2013
Status: Active Grant

First Claim

Patent Images

1. A language understanding (LU) system, comprising:

a computing device; and

a computer program having program modules executable by the computing device, the computing device being directed by the program modules of the computer program to,receive feature values corresponding a sequence of words,generate semantic labels for words in the sequence of words, said semantic label generation comprising using a recurrent conditional random field (R-CRF) comprising,a recurrent neural network (RNN) portion which generates RNN activation layer activations data that is indicative of a semantic label for a word, the RNN receiving feature values associated with a word in the sequence of words and outputting RNN activation layer activations data that is indicative of a semantic label, anda conditional random field (CRF) portion which takes as input the RNN activation layer activations data output from the RNN for one or more words in the sequence of words and outputs label data that is indicative of a separate semantic label that is to be assigned to each of the one or more words in the sequence of words associated with the RNN activation layer activations data, andassign each semantic label corresponding to the data output by the CRF portion of the R-CRF to the appropriate one said one or more words in the sequence of words.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Recurrent conditional random field (R-CRF) embodiments are described. In one embodiment, the R-CFR receives feature values corresponding to a sequence of words. Semantic labels for words in the sequence of words are then generated and each label is assigned to the appropriate one of the words in the sequence of words. The R-CRF used to accomplish these tasks includes a recurrent neural network (RNN) portion and a conditional random field (CRF) portion. The RNN portion receives feature values associated with a word in the sequence of words and outputs RNN activation layer activations data that is indicative of a semantic label. The CRF portion inputs the RNN activation layer activations data output from the RNN for one or more words in the sequence of words and outputs label data that is indicative of a separate semantic label that is to be assigned to each of the words.

Citations

20 Claims

1. A language understanding (LU) system, comprising:
- a computing device; and
  
  a computer program having program modules executable by the computing device, the computing device being directed by the program modules of the computer program to,receive feature values corresponding a sequence of words,generate semantic labels for words in the sequence of words, said semantic label generation comprising using a recurrent conditional random field (R-CRF) comprising,a recurrent neural network (RNN) portion which generates RNN activation layer activations data that is indicative of a semantic label for a word, the RNN receiving feature values associated with a word in the sequence of words and outputting RNN activation layer activations data that is indicative of a semantic label, anda conditional random field (CRF) portion which takes as input the RNN activation layer activations data output from the RNN for one or more words in the sequence of words and outputs label data that is indicative of a separate semantic label that is to be assigned to each of the one or more words in the sequence of words associated with the RNN activation layer activations data, andassign each semantic label corresponding to the data output by the CRF portion of the R-CRF to the appropriate one said one or more words in the sequence of words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, wherein the RNN activation layer activations data comprises data output by the activation layer of the RNN prior to any softmax normalization.
  - 3. The system of claim 1, wherein the RNN and CRF portions of the R-CRF are jointly trained using a set of training data pair sequences and a CRF sequence-level objective function, each of said training data pair sequences comprising a sequence of pairs of feature values corresponding to a word and label data that is indicative of a correct semantic label for that word.
  - 4. The system of claim 1, wherein the RNN portion of the R-CRF comprises:
    - an input layer of nodes wherein each feature value of the feature values associated with a word are input into a different one of the input layer nodes;
      
      a hidden layer comprising nodes that are connected to outputs of the input layer, each connection between the input layer and hidden layer being adjustably weighted; and
      
      an activation layer comprising nodes that are connected to outputs of the hidden layer, each connection between the hidden layer and activation layer being adjustably weighted, and wherein outputs of the activation layer are connected to inputs of the CRF portion of the R-CRF.
  - 5. The system of claim 4, wherein the feature values associated with a word form a multi-dimensional input vector having a number of elements equal to or larger than a size of a vocabulary of words, and wherein the input layer of nodes comprises a different node for each element of the input vector.
  - 6. The system of claim 4, wherein the label data output from the CRF portion of the R-CRF forms a multi-dimensional output vector having a number of elements equal to a number of possible semantic labels, and wherein the CRF portion of the R-CRF comprises output nodes equaling the number of output vector elements and a different output node of which is dedicated to each different element of the output vector.
  - 7. The system of claim 4, wherein the RNN activation layer activations data output from the RNN portion of the R-CRF in response to the input of feature values associated with a word in the sequence of words is input into the nodes of the hidden layer along with the data output from the input layer upon input of feature values associated with a next word in the sequence of words input into the input layer.
  - 8. The system of claim 7, wherein RNN activation layer activations data input into the nodes of the hidden layer is adjustably weighted prior to input.
  - 9. The system of claim 4 wherein the hidden layer is fully-connected to the input layer and activation layer such that each node of the hidden layer is connected to each node of the input layer and each node of the activation layer.
  - 10. The system of claim 4, wherein the RNN portion of the R-CRF further comprises a feature layer which is used to input ancillary information into the RNN portion, said feature layer being comprised of nodes which input ancillary information values and output representative ancillary data, wherein an output of each of said feature layer nodes is connected to an input of each hidden layer node via a weighted hidden layer connection and to an input of each activation layer node via a weighted activation layer connection.
  - 11. The system of claim 4, wherein the RNN portion of the R-CRF further comprises one or more additional hidden layers, each additional hidden layer being fully connected to the layer preceding the additional hidden layer and the layer subsequent to the additional hidden layer such that each node of the additional hidden layer is connected to each node of the preceding layer and each node of the subsequent layer.

12. A recurrent conditional random field (R-CRF), comprising:
- a recurrent neural network (RNN) portion which generates RNN activation layer activations data that is indicative of a label for a word, the RNN receiving feature values associated with a word in the sequence of words and outputting RNN activation layer activations data that is indicative of a label, said RNN portion comprising,an input layer of nodes wherein each feature value of the feature values associated with a word are input into a different one of the input layer nodes,a hidden layer comprising nodes that receive outputs from the input layer, said outputs from the input layer being adjustably weighted, andan activation layer comprising nodes that receive outputs from the hidden layer, said outputs from the hidden layer being adjustably weighted; and
  
  a conditional random field (CRF) portion which takes as input the RNN activation layer activations data output from the activation layer of the RNN portion for words in the sequence of words and which outputs label data that is indicative of a separate label that is to be assigned to each of the words in the sequence of words associated with the RNN activation layer activations data.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The R-CRF of claim 12, wherein the RNN activation layer activations data comprises data output by the activation layer of the RNN prior to any softmax normalization.
  - 14. The R-CRF of claim 12, wherein the RNN and CRF portions of the R-CRF are jointly trained using a set of training data pair sequences and a CRF sequence-level objective function, each of said training data pair sequences comprising a sequence of pairs of feature values corresponding to a word and label data that is indicative of a correct label for that word.
  - 15. The R-CRF of claim 12, wherein the RNN activation layer activations data output from the RNN portion of the R-CRF in response to the input of feature values associated with a word in the sequence of words is input into the nodes of the hidden layer along with the data output from the input layer upon input of feature values associated with a next word in the sequence of words input into the input layer.
  - 16. The R-CRF of claim 15, wherein RNN activation layer activations data input into the nodes of the hidden layer is adjustably weighted prior to input.
  - 17. The R-CRF of claim 12, wherein the RNN portion of the R-CRF further comprises a feature layer which is used to input ancillary information into the RNN portion, said feature layer being comprised of nodes which input ancillary information values and output representative ancillary data, wherein an output of each of said feature layer nodes is input into each hidden layer node via a weighted hidden layer connection and input into each activation layer node via a weighted activation layer connection.
  - 18. The R-CRF of claim 12 wherein the hidden layer is fully-connected to the input layer and activation layer such that each node of the hidden layer is connected to each node of the input layer and each node of the activation layer.

19. A computer-implemented process for training a recurrent conditional random field (R-CRF) to output semantic label designations for words in a sequence of words, said R-CRF comprising a recurrent neural network (RNN) portion which outputs RNN activation layer activations data that is indicative of a semantic label for a word in response to feature values associated with that word in the sequence of words being input and which comprises a series of interconnected multi-node layers having weighted connections between layers, and a conditional random field (CRF) portion which takes as input the RNN activation layer activations data output from the RNN portion for one or more words in the sequence of words and then outputs label data that is indicative of a separate semantic label that is to be assigned to each of the one or more words in the sequence of words, said training process comprising:
- using a computing device to perform the following process actions;
  
  accessing a set of training data pair sequences, each of said training data pair sequences comprising a sequence of pairs of feature values corresponding to a word and label data that is indicative of a correct semantic label for that word;
  
  inputting each training data pair sequence of said set one by one into the R-CRF; and
  
  for each training data pair sequence input into the R-CRF,employing a CRF sequence-level objective function and a backpropagation procedure to compute adjusted weights for the connections between layers of the RNN portion of the R-CRF, andchanging the weight associated with the connections between layers of the RNN portion of the R-CRF based on the computed adjusted weights.
- View Dependent Claims (20)
- - 20. The process of claim 19, wherein the CRF sequence-level objective function takes the form of

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Yao, Kaisheng, Zweig, Geoffrey Gerson, Yu, Dong
Primary Examiner(s)
SINGH, SATWANT K

Application Number

US14/201,670
Publication Number

US 20150161101A1
Time in Patent Office

683 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 40/30   Semantic analysis

G06N 3/02   Neural networks

G06N 3/047   Probabilistic or stochastic...

Recurrent conditional random fields

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Recurrent conditional random fields

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links