Method for optimizing loads of speech/user recognition system

US 20060136218A1
Filed: 12/14/2005
Published: 06/22/2006
Est. Priority Date: 12/16/2004
Status: Abandoned Application

First Claim

Patent Images

1. A method for optimizing a load of a speech/user recognition system, wherein said speech/user recognition system comprises a server end, a client end and a network, and the method is achieved by performing N stages of computations for a speech feature of a speech, where N is a positive integer, and an i is selected from 1 to N for representing the i^thstage speech feature, comprising steps of:

(a) providing a computation time for computing a respective stage i of the speech feature at the client end, wherein a factor Ta(i) is for a computation time of computing the i^thstage speech feature at the client end with respect to the input time;

(b) providing a computation time for computing a respective stage i of the speech feature at the server end, wherein a factor Tb(i) is for a computation time of computing the i^thstage speech feature at the server end with respect to the input time;

(c) providing a load c of the server end and a load d of the network;

(d) deciding an n in the range from 1 to N for minimizing a recognition time T_outputof the speech;

(e) inputting the speech for being recognized with a time T_input;

(f) performing an computation from the first stage speech feature to the n^thstage speech of the speech at the client end, while performing an computation from the (n+1)^thstage speech feature to the N^thstage speech feature of the speech at the server end; and

(g) repeating steps (e) to (f).

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for optimizing a load of a speech/user recognition system is provided. The speech/user recognition system comprises a server end, a client end and a network, and the method is achieved by performing N stages of computations for speech features of a speech, where N is a positive integer, and an i is selected from 1 to N for representing the i^thstage speech features, comprising steps of: (a) providing a real time factor Ta(i) for computing a respective stage i of the speech features at the client end, where Ta(i) is an average computation time of computing the i^thstage speech features at the client end with respect to one second input speech; (b) providing a real time factor Tb(i) for computing a respective stage i of the speech features at the server end, where Tb(i) is an average computation time of computing the i^thstage speech features at the server end with respect to one second input speech; (c) providing a load c of the server end and a load d of the network; (d) deciding an n in the range from 1 to N for minimizing a recognition time T_outputof the speech; (e) inputting the speech with time T_inputfor being recognized; (f) performing an computation from the first stage speech features to the n^thstage speech features of the speech at the client end, while performing an computation from the (n+1)^thstage speech features to the N^thstage speech features of the speech at the server end; and (g) repeating steps (e) to (f).

32 Citations

View as Search Results

53 Claims

1. A method for optimizing a load of a speech/user recognition system, wherein said speech/user recognition system comprises a server end, a client end and a network, and the method is achieved by performing N stages of computations for a speech feature of a speech, where N is a positive integer, and an i is selected from 1 to N for representing the i^thstage speech feature, comprising steps of:
- (a) providing a computation time for computing a respective stage i of the speech feature at the client end, wherein a factor Ta(i) is for a computation time of computing the i^thstage speech feature at the client end with respect to the input time;
  
  (b) providing a computation time for computing a respective stage i of the speech feature at the server end, wherein a factor Tb(i) is for a computation time of computing the i^thstage speech feature at the server end with respect to the input time;
  
  (c) providing a load c of the server end and a load d of the network;
  
  (d) deciding an n in the range from 1 to N for minimizing a recognition time T_outputof the speech;
  
  (e) inputting the speech for being recognized with a time T_input;
  
  (f) performing an computation from the first stage speech feature to the n^thstage speech of the speech at the client end, while performing an computation from the (n+1)^thstage speech feature to the N^thstage speech feature of the speech at the server end; and
  
  (g) repeating steps (e) to (f).
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 35)
- - 2. The method according to claim 1, wherein the step (c) further comprising steps of:
    - (c1) inputting a first speech for being recognized within a first input time T_input1, wherein an accomplishment of the first speech recognition takes a first output time T_output1; and
      
      (c2) inputting a second speech for being recognized within a second input time T_input2, wherein an accomplishment of the second speech recognition takes a second output time T_output2.
  - 3. The method according to claim 2, wherein the first speech includes a data size Dn(T_input1).
  - 4. The method according to claim 3, wherein a time for the first speech features of stage n being transferred via the network is Dn(T_input1)/d.
  - 5. The method according to claim 4, wherein the data size of second speech features of stage n is Dn(T_input2).
  - 6. The method according to claim 5, wherein a time for the second speech features of stage n being transferred via the network is Dn(T_input2)/d.
  - 7. The method according to claim 6, wherein the data size of speech features of stage n includes a data size Dn(T_input).
  - 8. The method according to claim 7, wherein a time for the speech features of stage n being transferred via the network is Dn(T_input)/d.
  - 9. The method according to claim 8, wherein a transmitting time for a recognition result via the network is time K/d.
  - 10. The method according to claim 9, wherein the step (c1) further comprising steps of:
    - (c11) providing an n₁in the range from 1 to N; and
      
      (c12) performing a computation from the first stage speech feature to the n₁^thstage speech feature of the first speech at the client end, while performing an computation from the (n₁+1)^thstage speech feature to the N^thstage speech feature of the first speech at the server end.
  - 11. The method according to claim 10, wherein a computation time for the computation from the first stage speech feature to the n₁^thstage speech feature of the first speech at the client end is $T_{input}$
    - ⁢
      
      1 ×
      
      ∑
      
      i = 1 n 1 ⁢
      
      ⁢
      
      Ta ⁢
      
      ⁢
      
      ( i ) .
  - 12. The method according to claim 11, wherein a computation time for an computation from the (n₁+1)^thstage speech feature to the N^thstage speech feature of the first speech at the server end is $T_{input}$
    - ⁢
      
      1 ×
      
      1 c ⁢
      
      ∑
      
      i = n 1 + 1 N ⁢
      
      ⁢
      
      Tb ⁢
      
      ⁢
      
      ( i ) .
  - 13. The method according to claim 12, wherein a computation time for computing total N stages of the speech feature of the first speech is $T_{input}$
    - ⁢
      
      1 ×
      
      ( ∑
      
      i = 1 n 1 ⁢
      
      ⁢
      
      Ta ⁢
      
      ⁢
      
      ( i ) + 1 c ⁢
      
      ∑
      
      i = n 1 + 1 N ⁢
      
      ⁢
      
      Tb ⁢
      
      ⁢
      
      ( i ) ) .
  - 14. The method according to claim 13, wherein the first output time is a summation of the computation time for computing total N stages of the speech feature of the first speech, the time for transferring the first speech feature via the network, and the time for returning a recognition result via the network, and equals to $T_{output}$
    - ⁢
      
      1 = T input ⁢
      
      ⁢
      
      1 ×
      
      ( ∑
      
      i = 1 n 1 ⁢
      
      ⁢
      
      Ta ⁢
      
      ⁢
      
      ( i ) + 1 c ⁢
      
      ∑
      
      i = n 1 + 1 N ⁢
      
      ⁢
      
      Tb ⁢
      
      ⁢
      
      ( i ) ) + 1 d ⁢
      
      Dn ⁢
      
      ⁢
      
      ( T input ⁢
      
      ⁢
      
      1 ) + 1 d ⁢
      
      K .
  - 15. The method according to claim 9, wherein the step (c2) further comprising steps of:
    - (c21) providing an n₂in the range from 1 to N; and
      
      (c22) performing an computation from the first stage speech feature to the n₂^thstage speech feature of the second speech at the client end, while performing an computation from the (n₂+1)^thstage speech feature to the N^thstage speech feature of the first speech at the server end.
  - 16. The method according to claim 15, wherein a computation time for the computation from the first stage speech feature to the n₂^thstage speech feature of the second speech at the client end is $T_{input}$
    - ⁢
      
      2 ×
      
      ∑
      
      i = 1 n 2 ⁢
      
      ⁢
      
      Ta ⁢
      
      ⁢
      
      ( i ) .
  - 17. The method according to claim 16, wherein a computation time for an computation from the (n₂+1)^thstage speech feature to the N^thstage speech feature of the second speech at the server end is $T_{input}$
    - ⁢
      
      2 ×
      
      1 c ⁢
      
      ∑
      
      i = n 2 + 1 N ⁢
      
      ⁢
      
      Tb ⁢
      
      ⁢
      
      ( i ) .
  - 18. The method according to claim 17, wherein a computation computation time for computing total N stages speech feature of the second speech is $T_{input}$
    - ⁢
      
      2 ×
      
      ( ∑
      
      i = 1 n 2 ⁢
      
      Ta ⁡
      
      ( i ) + 1 c ⁢
      
      ∑
      
      i = n 2 + 1 N ⁢
      
      Tb ⁡
      
      ( i ) ) .
  - 19. The method according to claim 18, wherein the second output time is a summation of the computation time for computing total N stages of the speech feature of the second speech, the time for transferring the second speech feature of stage n via the network, and the time for returning a recognition result via the network, and equals to $T_{output}$
    - ⁢
      
      2 = T input ⁢
      
      ⁢
      
      2 ×
      
      ( ∑
      
      i = 1 n 2 ⁢
      
      Ta ⁡
      
      ( i ) + 1 c ⁢
      
      ∑
      
      i = n 2 + 1 N ⁢
      
      Tb ⁡
      
      ( i ) ) + 1 d ⁢
      
      Dn ⁡
      
      ( T input ⁢
      
      ⁢
      
      2 ) + 1 d ⁢
      
      K .
  - 20. The method according to claim 1, wherein the computation time for being recognized the speech is the computation time for computing total N stages speech features for the speech, the time for transferring the speech feature of stage n via the network, and the time for returning a recognition result via the network, and equals to. $T_{output}$
    - = T input ⁢
      
      ×
      
      ( ∑
      
      i = 1 n ⁢
      
      Ta ⁡
      
      ( i ) + 1 c ⁢
      
      ∑
      
      i = n + 1 N ⁢
      
      Tb ⁡
      
      ( i ) ) + 1 d ⁢
      
      Dn ⁡
      
      ( T input ⁢
      
      ) + 1 d ⁢
      
      K
  - 35. The method according to claim 33, wherein the c the d are obtained according to the method as recited in claim 1.

21. A method for optimizing a recording frame-synchronized speech feature computation comprising a server end, a client end and a network, and the method is achieved by performing N stages of computations for a speech feature of a speech having N′
- frames, where N and N′
  
  are a positive integers, where an i is selected from the range from 1 to N for representing the i^thstage speech feature, and a n′
  
  is selected from the range from 1 to N′
  
  for representing the n′
  
  ^thframe, comprising steps of;
  
  (a) providing an specific n in the range from 1 to N. (b) inputting said speech for an input time (T_input), wherein an computation from the first stage speech feature to the n^thstage speech feature of each frame of the speech is performed at the client end, and an computation from the (n+1)^thstage speech feature to the N^thstage speech feature of each frame of the speech is performed at the server end; and
  
  (c) after the step (b) is carried out, an computation of the n′
  
  frames is achieved, and a speech feature computation of the n₁^thstage of the (n′
  
  +1)^thframe is achieved, modifying the n by a specific manner according to the n₁to minimize a computation time for recognizing the speech; and
  
  (d) performing an computation from the first stage speech feature to the n^thstage speech feature of the respective remaining frames at the client end according to the modified n in step (c), while performing an computation from the (n+1)^thstage speech feature to the N^thstage speech feature of the respective remaining frames at the server end.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 22. The method according to claim 21, wherein the method is used in a recording frame-synchronized speech feature computation system.
  - 23. The method according to claim 21, wherein in the step (b) the recording frame-synchronized speech feature computation system synchronously performs the speech feature computations
  - 24. The method according to claim 21, wherein in the step (c) an computation of the n′
    - frames is achieved by the recording frame-synchronized speech feature computation system.
  - 25. The method according to claim 21, wherein the n in the step (a) is obtained by optimizing a load of a speech/user recognition system, wherein said speech/user recognition system comprises a server end, a client end and a network, and the method is achieved by performing N stages of computations for a speech feature of a speech, where N is a positive integer, and an i is selected from 1 to N for representing the i^thstage speech feature, comprising steps of:
    - (i) providing a computation time for computing a respective stage i of the speech feature at the client end, wherein a factor Ta(i) is for a computation time of computing the i^thstage speech feature at the client end with respect to the input time;
      
      (ii) providing a computation time for computing a respective stage i of the speech feature at the server end, wherein a factor Tb(i) is for a computation time of computing the i^thstage speech feature at the server end with respect to the input time;
      
      (iii) providing a load c of the server end and a load d of the network;
      
      (iv) deciding an n in the range from 1 to N for minimizing a recognition time T_ouputof the speech;
      
      (v) inputting the speech for being recognized with a time T_input;
      
      (vi) performing an computation from the first stage speech feature to the n^thstage speech of the speech at the client end, while performing an computation from the (n+1)^thstage speech feature to the N^thstage speech feature of the speech at the server end; and
      
      (vii) repeating steps (v) to (vi).
  - 26. The method according to claim 21, wherein a factor Ta(i) is for a computation time of computing the i^thstage speech feature at the client end with respect to the input speech.
  - 27. The method according to claim 26, wherein a factor Tb(i) is for a computation time of computing the i^thstage speech feature at the server end with respect to the input speech.
  - 28. The method according to claim 27 wherein a computation time for an computation from the first stage speech feature to the n^thstage speech feature of said speech at the client end is $T_{input} \times$
    - ∑
      
      i = 1 n ⁢
      
      Ta ⁡
      
      ( i ) .
  - 29. The method according to claim 28, wherein a computation time for an computation from the (n+1)^thstage speech feature to the N^thstage speech feature of said speech at the server end is $T_{input} \times$
    - 1 c ⁢
      
      ∑
      
      i = n + 1 N ⁢
      
      Tb ⁡
      
      ( i ) .
  - 30. The method according to claim 29, wherein a computation time for computing total N stages of the speech feature of the speech is $T_{input} \times$
    - ( ∑
      
      i = 1 n ⁢
      
      Ta ⁡
      
      ( i ) + 1 c ⁢
      
      ∑
      
      i = n + 1 N ⁢
      
      Tb ⁡
      
      ( i ) ) .
  - 31. The method according to claim 30, wherein the data size of speech feature of stage n is Dn(T_input).
  - 32. The method according to claim 31, wherein a time for the speech feature of stage n being transferred via the network is Dn(T_input)/d.
  - 33. The method according to claim 32, wherein a transmitting time for a recognition result being returned by the network is K/d.
  - 34. The method according to claim 33, wherein the specific manner in the step (c) uses:
    - (c1) if n₁is smaller than n, an equation $n = \underset{n}{Arg} (Min (T_{input} \times [(\sum_{i = 1}^{n} Ta (i) + \frac{1}{c} \sum_{i = n + 1}^{N} Tb (i)) + \sum_{i = n_{1}}^{N} Ta (i) + \frac{1}{c} \sum_{i = n + 1}^{N} Tb (i)] + \frac{1}{d} Dn (T_{input}) + \frac{1}{d} K)) is$ used for obtaining the modified n; and
      
      (c2) if n_iis greater than n, an equation $n = \underset{n}{Arg} (Min (T_{input} \times [(\sum_{i = 1}^{n} Ta (i) + \frac{1}{c} \sum_{i = n + 1}^{N} Tb (i)) + \frac{1}{c} \sum_{i = n_{1} + 1}^{N} Tb (i)] + \frac{1}{d} Dn (T_{input}) + \frac{1}{d} K))$ is used for obtaining the modified n, wherein c is a load of the server end and d is a load of the network.

36. A method for optimizing a load of a speech/user recognition system comprising a server end, a client end and a network, wherein a recognition is achieved by performing plural stages of computations to speech features of a speech having an inputting time, comprising steps of:
- (a) providing a real time factor Ta(i) for computing a respective stage i speech feature at the client end;
  
  (b) providing a real time factor Tb(i) for computing a respective stage i speech feature at the server end;
  
  (c) providing a load of the server end and a load of the network;
  
  (d) obtaining a specific amount according to the load of the server end and the load of the network to minimize a computation time for recognizing said speech; and
  
  (e) determining the computations at the client end and the server end according to the specific amount and the performing the plural stages of computations for the speech features of the speech.
- View Dependent Claims (37, 38, 39, 40, 41, 42, 43)
- - 37. The method according to claim 36, wherein the step (c) further comprises steps of:
    - (c1) inputting a first speech to be recognized during a first input time, where an accomplishment of a recognition of the first speech is a first output time; and
      
      (c2) inputting a second speech to be recognized during a second input time, where an accomplishment of a recognition of the second speech is a second output time; and
      
      (c3) estimating the load of the server end and the load of the network according to the first and second output times of (c1) and (c2).
  - 38. The method according to claim 36, wherein the computation time for computing all stages of the speech feature at the client end is directly proportional to the inputting time.
  - 39. The method according to claim 36, wherein the computation time for computing all stages of the speech feature at the server end is directly proportional to the inputting time.
  - 40. The method according to claim 36, wherein the speech includes a data size.
  - 41. The method according to claim 36, wherein a time for transferring the speech feature via network is a ratio of the data size to the load of the network.
  - 42. The method according to claim 36, wherein a time for computing the stages of the speech feature is a summation of the respective times for computing the speech feature at the client end and at the server end.
  - 43. The method according to claim 36, wherein an output time of the speech is a summation of the computation time for computing said all stages of said speech feature, the time for transmitting the speech feature via the network, and the time for transmitting a recognition result via the network.

44. A method for optimizing a recording frame-synchronized speech feature computation comprising a server end, a client end and a network, wherein a recognition of a speech is achieved by performing plural stages of computations for speech features of the speech having plural frames, comprising steps of:
- (a) providing a specific amount;
  
  (b) inputting the speech for an input time;
  
  (c) after the step (b) is carried out when a part of the plural frames has not been computed, and only part computations of the plural stages for the speech feature of a first frame of the frames having not been computed, modifying the specific amount by specific manner, to minimize a computation time for recognizing the speech; and
  
  (d) distributing the respective loads of the server end and the client end according to the modified specific amount in the step (c) and then performing computations for the frames having not been computed to achieve the recognition.
- View Dependent Claims (45, 46, 47, 48, 49, 50, 51, 52, 53)
- - 45. The method according to claim 44, wherein the method is used in a recording frame-synchronized speech feature computation system.
  - 46. The method according to claim 44, wherein the recording frame-synchronized speech feature computation system synchronously performs the speech feature computations, wherein the system distributes the respective computation at the client end and the server end according to the specific amount.
  - 47. The method according to claim 44, wherein the specific amount in the step a is obtained by optimizing a load of a speech/user recognition system, wherein said speech/user recognition system comprises a server end, a client end and a network, and the method is achieved by performing N stages of computations for a speech feature of a speech, where N is a positive integer, and an i is selected from 1 to N for representing the i^thstage speech feature, comprising steps of:
    - (i) providing a computation time for computing a respective stage i of the speech feature at the client end, wherein a factor Ta(i) is for a computation time of computing the i^thstage speech feature at the client end with respect to the input time;
      
      (ii) providing a computation time for computing a respective stage i of the speech feature at the server end, wherein a factor Tb(i) is for a computation time of computing the i^thstage speech feature at the server end with respect to the input time;
      
      (iii) providing a load c of the server end and a load d of the network;
      
      (iv) deciding an n in the range from 1 to N for minimizing a recognition time T_ouputof the speech;
      
      (v) inputting the speech for being recognized with a time T_input;
      
      (vi) performing an computation from the first stage speech feature to the n^thstage speech of the speech at the client end, while performing an computation from the (n+1)^thstage speech feature to the N^thstage speech feature of the speech at the server end; and
      
      (vii) repeating steps (v) to (vi).
  - 48. The method according to claim 44, wherein a computation time for computing one of the plural stages of computations at the client end is directly proportional to the input time.
  - 49. The method according to claim 44, wherein a computation time for computing one of the plural stages of computations at the server end is directly proportional to the input time.
  - 50. The method according to claim 44, wherein the speech includes a data size.
  - 51. The method according to claim 44, wherein a time for transmitting the speech feature via the network is the ratio of the data size to the load of the network.
  - 52. The method according to claim 44, wherein a time for all plural stages of computations is the summation of a time for computing the speech feature at the client end and a time for computing the speech feature at the server end.
  - 53. The method according to claim 44, wherein an output time of the speech is a summation of a time for computing the speech feature, a time for transmitting the speech feature via the network, and a time for transmitting a recognition result via the network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Delta Electronics Incorporated
Original Assignee
Delta Electronics Incorporated
Inventors
Lee, Yun-wen

Application Number

US11/300,048
Publication Number

US 20060136218A1
Time in Patent Office

Days
Field of Search
US Class Current

704/270.100
CPC Class Codes

G10L 15/285 Memory allocation or algori...

Method for optimizing loads of speech/user recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

32 Citations

53 Claims

Specification

Solutions

Use Cases

Quick Links

Method for optimizing loads of speech/user recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

53 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links