REAL-TIME VOICE RECOGNITION APPARATUS EQUIPPED WITH ASIC CHIP AND SMARTPHONE

US 20200135178A1
Filed: 04/26/2018
Published: 04/30/2020
Est. Priority Date: 06/28/2017
Status: Active Grant

First Claim

Patent Images

1. A real-time voice recognition apparatus comprising an ASIC chip and a smartphone, the apparatus comprising:

an ASIC chip configured to receive a first digital audio speech as an input, generate a feature vector, and output a phoneme score from the feature vector; and

a smartphone connected to the ASIC chip through a data port and configured to receive the phoneme score from the ASIC chip as an input, output word text, and process and output the word text according to user needs,wherein the ASIC chip comprises;

an input signal selector configured to select one of a second digital audio speech and a third digital audio speech in response to an input selection control signal (SEL) and output the selected audio speech as the first digital audio speech (A_IN);

a feature extractor configured to receive the first digital audio speech and one or more first control signals (C1) as an input and output the feature vector;

an acoustic model processor configured to receive the feature vector and one or more second control signals (C2) as an input and output the phoneme score;

a connection device configured to receive the phoneme score from the acoustic model processor as an input, output the received phoneme score to the smartphone, receive data and a first DC voltage (DC1) as an input from the smartphone, and output the second digital audio speech, the input selection control signal (SEL), the first control signal (C1), the second control signal (C2) and the second DC voltage (DC2); and

a DC-DC converter configured to receive the second DC voltage (DC2) as an input and output a third DC voltage (DC3).

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a real-time voice recognition apparatus equipped with an application-specific integrated circuits (ASIC) chip and a smartphone, capable, by using one smartphone and one ASIC chip and without using a cloud computer, of assuring personal privacy, and, due to a short delay time, enabling real-time conversion of voice input signals into text for output. When one DRAM chip is optionally added to the real-time voice recognition apparatus, the number of neural network layers is increased thereby significantly improving accuracy of conversion of voice input signals into text.

5 Citations

29 Claims

1. A real-time voice recognition apparatus comprising an ASIC chip and a smartphone, the apparatus comprising:
- an ASIC chip configured to receive a first digital audio speech as an input, generate a feature vector, and output a phoneme score from the feature vector; and
  
  a smartphone connected to the ASIC chip through a data port and configured to receive the phoneme score from the ASIC chip as an input, output word text, and process and output the word text according to user needs,wherein the ASIC chip comprises;
  
  an input signal selector configured to select one of a second digital audio speech and a third digital audio speech in response to an input selection control signal (SEL) and output the selected audio speech as the first digital audio speech (A_IN);
  
  a feature extractor configured to receive the first digital audio speech and one or more first control signals (C1) as an input and output the feature vector;
  
  an acoustic model processor configured to receive the feature vector and one or more second control signals (C2) as an input and output the phoneme score;
  
  a connection device configured to receive the phoneme score from the acoustic model processor as an input, output the received phoneme score to the smartphone, receive data and a first DC voltage (DC1) as an input from the smartphone, and output the second digital audio speech, the input selection control signal (SEL), the first control signal (C1), the second control signal (C2) and the second DC voltage (DC2); and
  
  a DC-DC converter configured to receive the second DC voltage (DC2) as an input and output a third DC voltage (DC3).
- View Dependent Claims (3, 4, 5, 7, 9, 10, 12, 13)
- - 3. The real-time voice recognition apparatus of claim 1, wherein the connection device is connected to the smartphone through the data port of the smartphone, outputs the second digital audio speech (A2) and the input selection control signal (SEL) to the input signal selector, outputs the one or more first control signals to the feature extractor, outputs the one or more second control signals to the acoustic model processor, and outputs the second DC voltage (DC2) to the DC-DC converter.
  - 4. The real-time voice recognition apparatus of claim 3, wherein the connection device comprises:
    - a connection device circuit unit included in the ASIC chip; and
      
      an electrical connection device connected to the data port of the smartphone through a connector or a cable.
  - 5. The real-time voice recognition apparatus of claim 4, wherein the connection device is supplied with power from the smartphone through the electrical connection device.
  - 7. The real-time voice recognition apparatus of claim 1, wherein the DC-DC converter outputs the third DC voltage (DC3) to the connection device, the input signal selector, the feature extractor and the acoustic model processor.
  - 9. The real-time voice recognition apparatus of claim 1, wherein the ASIC chip further comprises the third digital audio speech generator configured to generate a third digital audio speech and provide the third digital audio speech for the input signal selector.
  - 10. The real-time voice recognition apparatus of claim 9, wherein the third digital audio speech generator comprises:
    - at least one analog amplifier configured to receive at least one analog audio signal, output by an external microphone, as an input, amplify the received analog audio signal, and output at least one analog output signal;
      
      at least one analog-digital converter configured to receive the at least one analog output signal as an input and output at least one digital audio speech; and
      
      a digital signal processor configured to receive the at least one digital audio speech as an input and output one third digital audio speech,wherein the digital signal processor comprises a beamforming function.
  - 12. The real-time voice recognition apparatus of claim 1, wherein the smartphone comprises:
    - a lexical/language model processor configured to receive the phoneme score as an input from the ASIC chip and perform a lexical/language model processing function by outputting word text; and
      
      an output processor configured to receive the word text as an input and perform a function for outputting and processing the word text according to user needs.
  - 13. The real-time voice recognition apparatus of claim 12, wherein the output processor displays the word text on a display embedded in the smartphone, stores the word text in a memory unit of the smartphone, and transmits the word text to an external device using a wireless communication function of the smartphone, or the smartphone synthesizes the word text in a voice form and outputs the word text to a voice output device embedded in the smartphone or an external voice output device connected to the smartphone in a wired or wireless manner.

2. (canceled)

6. (canceled)

8. The real-time voice recognition apparatus of claim wherein the acoustic model processor uses a recurrent neural network (RNN) algorithm of a long short term memory (LSTM) or gated recurrent unit (GRU) method.

11. (canceled)

14. (canceled)

15. A real-time voice recognition apparatus comprising an ASIC chip and a smartphone, the apparatus comprising:
- a DRAM chip configured to output first parameter data (Dp1) and receive second parameter data (Dp2) as an input;
  
  an ASIC chip configured to receive a first digital audio speech and the first parameter data (Dp1) as an input and output a phoneme score (PS) and the second parameter data (Dp2); and
  
  a smartphone connected to the ASIC chip through a data port and configured to receive the phoneme score as an input from the ASIC chip, output word text, and process and output the word text according to user needs,wherein the ASIC chip comprises;
  
  an input signal selector configured to select one of a second digital audio speech and a third digital audio speech in response to an input selection control signal (SEL) and output the selected audio speech as the first digital audio speech (A_IN);
  
  a feature extractor configured to receive the first digital audio speech and one or more first control signals (C1) as an input and output the feature vector;
  
  an acoustic model processor configured to receive the feature vector, one or more second control signals (C2) and a third parameter data (Dp3) as an input and output the phoneme score;
  
  a DRAM controller configured to receive the first parameter data (Dp1), a fourth parameter data (Dp4), and one or more sixth control signals (C6) as an input and output the second parameter data and the third parameter data;
  
  a connection device configured to receive the phoneme score as an input from the acoustic model processor, output the received phoneme score to the smartphone, receive data and a first DC voltage (DC1) as an input from the smartphone, and output the second digital audio speech (A2), the fourth parameter data (Dp4), the input selection control signal (SEL), the first control signal (C1), the second control signal (C2), the sixth control signal (C6) and the second DC voltage (DC2); and
  
  a DC-DC converter configured to receive the second DC voltage (DC2) as an input and output a third DC voltage (DC3).
- View Dependent Claims (17, 18, 20, 21, 23, 24, 25, 27, 28)
- - 17. The real-time voice recognition apparatus of claim 15, wherein the connection device is connected to the smartphone through the data port of the smartphone, outputs the second digital audio speech (A2) and the input selection control signal (SEL) to the input signal selector, outputs the one or more first control signals to the feature extractor, outputs the one or more second control signals to the acoustic model processor, outputs the one or more sixth control signals and the fourth parameter data to the DRAM controller, and outputs the second DC voltage (DC2) to the DC-DC converter.
  - 18. The real-time voice recognition apparatus of claim 15, wherein the DC-DC converter outputs the third DC voltage (DC3) to each of the input signal selector, the feature extractor, the acoustic model processor, the connection device, the DRAM controller and the DRAM chip,wherein the DRAM chip is supplied with all DC supply voltages necessary for an operation of the DRAM chip from the ASIC chip.
  - 20. The real-time voice recognition apparatus of claim 15, wherein each of the first parameter data (Dp1) and the second parameter data (Dp2) is a binary digital signal having the number of bits of 8 bits or more.
  - 21. The real-time voice recognition apparatus of claim 17, wherein the connection device is supplied with power from the smartphone through an electrical connection device connected to the data port of the smartphone through a connector or a cable.
  - 23. The real-time voice recognition apparatus of claim 15, wherein the acoustic model processor uses a recurrent neural network (RNN) algorithm of a long short term memory (LSTM) or gated recurrent unit (GRU) method.
  - 24. The real-time voice recognition apparatus of claim 15, wherein the ASIC chip further comprises a third digital audio speech generator configured to generate the third digital audio speech and provide the third digital audio speech for the input signal selector.
  - 25. The real-time voice recognition apparatus of claim 24, wherein the third digital audio speech generator comprises:
    - at least one analog amplifier configured to receive at least one analog audio signal, output by an external microphone, as an input, amplify the received analog audio signal, and output at least one analog output signal;
      
      at least one analog-digital converter configured to receive the at least one analog output signal as an input and output at least one digital audio speech; and
      
      a digital signal processor configured to receive the at least one digital audio speech as an input and output one third digital audio speech,wherein the digital signal processor comprises a beamforming function.
  - 27. The real-time voice recognition apparatus of claim 15, wherein the smartphone comprises:
    - a lexical/language model processor configured to receive the phoneme score as an input from the ASIC chip and perform a lexical/language model processing function by outputting word text; and
      
      an output processor configured to receive the word text as an input and perform a function for outputting and processing the word text according to user needs.
  - 28. The real-time voice recognition apparatus of claim 27, wherein the output processor displays the word text on a display embedded in the smartphone, stores the word text in a memory unit of the smartphone, and transmits the word text to an external device using a wireless communication function of the smartphone, or the smartphone synthesizes the word text in a voice form and outputs the word text to a voice output device embedded in the smartphone or an external voice output device connected to the smartphone in a wired or wireless manner, andwherein the lexical/language model processor simultaneously uses a central processing unit (CPU) and a graphics processing unit (GPU) embedded in the smartphone in order to perform the lexical/language model processing function.

16. (canceled)

19. (canceled)

22. (canceled)

26. (canceled)

29. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Postech Academy-Industry Foundation
Original Assignee
Postech Academy-Industry Foundation
Inventors
PARK, Hong June, NOH, Hyeon Kyu, LEE, Won Cheol, JEONG, Kyeong Won

Granted Patent

US 11,183,177 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 21/32   using biometric data, e.g. ...

G06F 21/76   in application-specific int...

G06F 3/167   Audio in a user interface, ...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/048   Activation functions

G06N 3/08   Learning methods

G10L 13/00   Speech synthesis; Text to s...

G10L 15/02   Feature extraction for spee...

G10L 15/16   using artificial neural net...

G10L 15/183   using context dependencies,...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 15/28   Constructional details of s...

G10L 15/30   Distributed recognition, e....

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/223   Execution procedure of a sp...

REAL-TIME VOICE RECOGNITION APPARATUS EQUIPPED WITH ASIC CHIP AND SMARTPHONE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

5 Citations

29 Claims

Specification

Use Cases

Quick Links

Others

REAL-TIME VOICE RECOGNITION APPARATUS EQUIPPED WITH ASIC CHIP AND SMARTPHONE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

5 Citations

29 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others