System and method for natural language processing

US 10,719,507 B2
Filed: 09/21/2017
Issued: 07/21/2020
Est. Priority Date: 09/21/2017
Status: Active Grant

First Claim

Patent Images

1. A method for natural language processing, implementable by a server, the method comprising:

obtaining, from a computing device, an audio input and a current interface, wherein the current interface is associated with a context comprising a first context and a second context; and

determining a query associated with the audio input based on the audio input and the context of the current interface by;

feeding the audio input to a voice recognition engine to determine raw texts corresponding to the audio input;

adjusting a weight in one or more first machine learning models based on the first context associated with the current interface;

applying the one or more first machine learning models to the first context and to;

the raw texts, pre-processed texts, tokenized texts, or vectorized texts, to obtain an intent classification of the audio input according to the weight, wherein the pre-processed texts, tokenized texts, and vectorized texts are associated with the raw texts;

applying one or more second machine learning models to the second context and to;

the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts to obtain one or more sub-classification prediction distributions of the audio input;

comparing the one or more sub-classification prediction distributions with a preset threshold and against an intent database to obtain an intent sub-classification of the audio input, wherein the intent sub-classification corresponds to a sub-classification prediction distribution exceeding the preset threshold and matches an intent in the intent database; and

determining the query based on the intent classification or the intent sub-classification of the audio input.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are provided for natural language processing. An exemplary method implementable by a server may comprise: obtaining, from a computing device, an audio input and a current interface, wherein the current interface is associated with a context; and determining a query associated with the audio input based at least on the audio input and the context of the current interface.

22 Citations

20 Claims

1. A method for natural language processing, implementable by a server, the method comprising:
- obtaining, from a computing device, an audio input and a current interface, wherein the current interface is associated with a context comprising a first context and a second context; and
  
  determining a query associated with the audio input based on the audio input and the context of the current interface by;
  
  feeding the audio input to a voice recognition engine to determine raw texts corresponding to the audio input;
  
  adjusting a weight in one or more first machine learning models based on the first context associated with the current interface;
  
  applying the one or more first machine learning models to the first context and to;
  
  the raw texts, pre-processed texts, tokenized texts, or vectorized texts, to obtain an intent classification of the audio input according to the weight, wherein the pre-processed texts, tokenized texts, and vectorized texts are associated with the raw texts;
  
  applying one or more second machine learning models to the second context and to;
  
  the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts to obtain one or more sub-classification prediction distributions of the audio input;
  
  comparing the one or more sub-classification prediction distributions with a preset threshold and against an intent database to obtain an intent sub-classification of the audio input, wherein the intent sub-classification corresponds to a sub-classification prediction distribution exceeding the preset threshold and matches an intent in the intent database; and
  
  determining the query based on the intent classification or the intent sub-classification of the audio input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein:
    - the computing device is configured to provide a plurality of inter-switchable interfaces;
      
      the plurality of interfaces comprise at least one of;
      
      an interface associated with navigation, an interface associated with media, or an interface associated with messaging;
      
      the first context comprises at least one of;
      
      the current interface as navigation, the current interface as media, or the current interface as messaging; and
      
      the second context comprises at least one of;
      
      an active route, a location, an active media session, or an active message.
  - 3. The method of claim 1, wherein before applying the one or more first machine learning models to:
    - the raw texts, pre-processed texts, tokenized texts, or vectorized texts, to obtain an intent classification of the audio input according to the weight, determining a query associated with the audio input based on the audio input and the context of the current interface further comprises;
      
      pre-processing the raw texts based on at least one of;
      
      lemmatizing, spell-checking, singularizing, or sentiment analysis to obtain the pre-processed texts;
      
      matching the pre-processed texts against preset patterns;
      
      in response to not detecting any preset pattern matching the pre-processed texts, tokenizing the pre-processed texts to obtain the tokenized texts; and
      
      vectorizing the tokenized texts to obtain the vectorized texts.
  - 4. The method of claim 1, wherein adjusting a weight in one or more first machine learning models based on the first context associated with the current interface and applying the one or more first machine learning models to the first context and to:
    - the raw texts, pre-processed texts, tokenized texts, or vectorized texts, to obtain an intent classification of the audio input according to the weight comprises;
      
      dynamically updating one or more weights in the one or more first machine learning models based on the first context.
  - 5. The method of claim 1, wherein applying the one or more first machine learning models to the first context and to:
    - the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts, to obtain an intent classification of the audio input comprises;
      
      applying a decision-tree-based model and a feedforward neural network model each to the first context and to;
      
      the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts to obtain corresponding output classifications;
      
      in response to determining that an output classification from the decision-tree-based model is the same as an output classification from the feedforward neural network model, using the either output classification as the intent classification of the audio input; and
      
      in response to determining that the output classification from the decision-tree-based model is different from the output classification from the feedforward neural network model, applying a directed acyclic graph-support vector machine (DAGSVM) model to;
      
      the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts to obtain the intent classification of the audio input.
  - 6. The method of claim 1, whereinthe one or more second machine learning models comprise:
    - a naive bayes model, a term frequency-inverse document frequency model, a N-gram model, a recurrent neural network model, or a feedforward neural network model.
  - 7. The method of claim 1, wherein determining a query associated with the audio input based on the audio input and the context of the current interface further comprises:
    - in response to multiple prediction distributions exceeding the preset threshold, determining that the audio input corresponds to multiple intents and applying a neural network model to divide;
      
      the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts correspondingly according to the multiple intents; and
      
      for each of the divided texts, applying the N-gram model to obtain the corresponding intent sub-classification.
  - 8. The method of claim 1, wherein determining a query associated with the audio input based on the audio input and the context of the current interface further comprises:
    - in response to determining that the intent classification and the intent sub-classification are consistent, extracting one or more entities from the tokenized texts; and
      
      in response to determining that the intent classification and the intent sub-classification are inconsistent, re-applying the one or more first machine learning models without the context of the current interface to;
      
      the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts to update the intent classification of the audio input.
  - 9. The method of claim 1, wherein determining a query associated with the audio input based on the audio input and the context of the current interface further comprises:
    - identifying one or more entities from the tokenized texts based on the intent classification, the intent sub-classification, or the second context;
      
      determining contents associated with the one or more entities based on public data or personal data; and
      
      determining the query as an intent corresponding to the intent classification or the intent sub-classification, in association with the determined one or more entities and the determined contents.

10. A system for natural language processing, implementable on a server, comprising a processor and a non-transitory computer-readable storage medium storing instructions that, when executed by the processor, cause the system to perform a method, the method comprising:
- obtaining, from a computing device, an audio input and a current interface, wherein the current interface is associated with a context comprising a first context and a second context; and
  
  determining a query associated with the audio input based on the audio input and the context of the current interface by;
  
  feeding the audio input to a voice recognition engine to determine raw texts corresponding to the audio input;
  
  adjusting a weight in one or more first machine learning models based on the first context associated with the current interface;
  
  applying the one or more first machine learning models to the first context and to;
  
  the raw texts, pre-processed texts, tokenized texts, or vectorized texts, to obtain an intent classification of the audio input, wherein the pre-processed texts, tokenized texts, and vectorized texts are associated with the raw texts;
  
  applying one or more second machine learning models to the second context and to;
  
  the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts to obtain one or more sub-classification prediction distributions of the audio input;
  
  comparing the one or more sub-classification prediction distributions with a preset threshold and against an intent database to obtain an intent sub-classification of the audio input, wherein the intent sub-classification corresponds to a sub-classification prediction distribution exceeding the preset threshold and matches an intent in the intent database; and
  
  determining the query based on the intent classification or the intent sub-classification of the audio input.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The system of claim 10, wherein:
    - the computing device is configured to provide a plurality of inter-switchable interfaces;
      
      the plurality of interfaces comprise at least one of;
      
      an interface associated with navigation, an interface associated with media, or an interface associated with messaging;
      
      the first context comprises at least one of;
      
      the current interface as navigation, the current interface as media, or the current interface as messaging; and
      
      the second context comprises at least one of;
      
      an active route, a location, an active media session, or an active message.
  - 12. The system of claim 10, wherein before applying the one or more first machine learning models to:
    - the raw texts, pre-processed texts, tokenized texts, or vectorized texts, to obtain an intent classification of the audio input according to the weight, determining a query associated with the audio input based on the audio input and the context of the current interface further comprises;
      
      pre-processing the raw texts based on at least one of;
      
      lemmatizing, spell-checking, singularizing, or sentiment analysis to obtain the pre-processed texts;
      
      matching the pre-processed texts against preset patterns;
      
      in response to not detecting any preset pattern matching the pre-processed texts, tokenizing the pre-processed texts to obtain the tokenized texts; and
      
      vectorizing the tokenized texts to obtain the vectorized texts.
  - 13. The system of claim 10, wherein adjusting a weight in one or more first machine learning models based on the first context associated with the current interface and applying the one or more first machine learning models to the first context and to:
    - the raw texts, pre-processed texts, tokenized texts, or vectorized texts, to obtain an intent classification of the audio input according to the weight comprises;
      
      dynamically updating one or more weights in the one or more first machine learning models based on the first context.
  - 14. The system of claim 10, wherein applying the one or more first machine learning models to the first context and to:
    - the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts, to obtain an intent classification of the audio input comprises;
      
      applying a decision-tree-based model and a feedforward neural network model each to the first context and to;
      
      the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts to obtain corresponding output classifications;
      
      in response to determining that an output classification from the decision-tree-based model is the same as an output classification from the feedforward neural network model, using the either output classification as the intent classification of the audio input; and
      
      in response to determining that the output classification from the decision-tree-based model is different from the output classification from the feedforward neural network model, applying a directed acyclic graph-support vector machine (DAGSVM) model to;
      
      the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts to obtain the intent classification of the audio input.
  - 15. The system of claim 10, whereinthe one or more second machine learning models comprise:
    - a naive bayes model, a term frequency-inverse document frequency model, a N-gram model, a recurrent neural network model, or a feedforward neural network model.
  - 16. The system of claim 10, wherein determining a query associated with the audio input based on the audio input and the context of the current interface further comprises:
    - in response to multiple prediction distributions exceeding the preset threshold, determining that the audio input corresponds to multiple intents and applying a neural network model to divide;
      
      the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts correspondingly according to the multiple intents; and
      
      for each of the divided texts, applying the N-gram model to obtain the corresponding intent sub-classification.
  - 17. The system of claim 10, wherein determining a query associated with the audio input based on the audio input and the context of the current interface further comprises:
    - in response to determining that the intent classification and the intent sub-classification are consistent, extracting one or more entities from the tokenized texts; and
      
      in response to determining that the intent classification and the intent sub-classification are inconsistent, re-applying the one or more first machine learning models without the context of the current interface to;
      
      the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts to update the intent classification of the audio input.
  - 18. The system of claim 10, wherein determining a query associated with the audio input based on the audio input and the context of the current interface further comprises:
    - identifying one or more entities from the tokenized text based on the intent classification, the intent sub-classification, or the second context;
      
      determining contents associated with the one or more entities based on public data or personal data; and
      
      determining the query as an intent corresponding to the intent classification or the intent sub-classification, in association with the determined one or more entities and the determined contents.

19. A method for natural language processing, comprising:
- obtaining an audio input from a computing device, wherein the audio is inputted to the computing device when a first interface of the computing device is active;
  
  determining a context of the first interface, the first interface comprising an interface associated with media, an interface associated with navigation, or an interface associated with messaging, the context comprising a first context and a second context;
  
  feeding the audio input and the context of the first interface to one or more algorithms to determine an audio instruction associated with the audio input; and
  
  transmitting a computing device instruction to the computing device based on the determined audio instruction, causing the computing device to execute the computing device instruction,wherein feeding the audio input and the context of the first interface to one or more algorithms to determine an audio instruction associated with the audio input comprises;
  
  feeding the audio input to a voice recognition engine to determine raw texts corresponding to the audio input;
  
  adjusting a weight in one or more first machine learning models based on the first context associated with the current interface; and
  
  applying the one or more first machine learning models to the first context and to;
  
  the raw texts, pre-processed texts, tokenized texts, or vectorized texts, to obtain an intent classification of the audio input, wherein the pre-processed texts, tokenized texts, and vectorized texts are associated with the raw texts;
  
  applying one or more second machine learning models to the second context and to;
  
  the raw texts, the pre-processed texts, the tokenized texts, or the vectorized texts to obtain one or more sub-classification prediction distributions of the audio input;
  
  comparing the one or more sub-classification prediction distributions with a preset threshold and against an intent database to obtain an intent sub-classification of the audio input, wherein the intent sub-classification corresponds to a sub-classification prediction distribution exceeding the preset threshold and matches an intent in the intent database.
- View Dependent Claims (20)
- - 20. The method of claim 19, wherein transmitting the computing device instruction to the computing device based on the determined audio instruction, causing the computing device to execute the computing device instruction comprises:
    - in response to determining that the audio instruction is empty, generating a first dialog based on the context of the first interface, causing the computing device to play the first dialog;
      
      in response to determining that the audio instruction comprises an entity, extracting the entity, and generating a second dialog based on the extracted entity, causing the computing device to play the second dialog;
      
      in response to determining that the audio instruction comprises a response, matching the response with a response database, and in response to detecting a matched response in the response database, causing the computing device to execute the matched response; and
      
      in response to determining that the audio instruction comprises a query, matching the query with a query database, and in response to detecting no matched query in the query database, feeding the audio input and the context of the first interface to the one or more of algorithms to determine an audio instruction associated with the query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SayMosaic, Inc.
Original Assignee
SayMosaic, Inc.
Inventors
He, Cheng, Jin, Jian
Primary Examiner(s)
Spooner, Lamont M

Application Number

US15/711,098
Publication Number

US 20190087455A1
Time in Patent Office

1,034 Days
Field of Search

704 1, 704 9, 704 10
US Class Current
CPC Class Codes

G06F 16/243   Natural language query form...

G06F 16/24522   Translation of natural lang...

G06F 16/24575   using context

G06F 16/3329   Natural language query form...

G06F 40/253   Grammatical analysis; Style...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/295   Named entity recognition

G06F 40/30   Semantic analysis

G06F 40/35   Discourse or dialogue repre...

G10L 15/16   using artificial neural net...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

System and method for natural language processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

22 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for natural language processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links