Method and device for searching according to speech based on artificial intelligence

US 10,157,619 B2
Filed: 11/28/2017
Issued: 12/18/2018
Est. Priority Date: 11/29/2016
Status: Active Grant

First Claim

Patent Images

1. A method for searching according to a speech based on artificial intelligence, comprising:

acquiring, by at least one computing device, sample speeches for training a preset classifier;

removing, by the at least one computing device, a silent speech from the sample speeches by performing a speech activity detection on the sample speeches, to obtain training speeches;

extracting, by the at least one computing device, acoustic features of each training speech; and

training, by the at least one computing device, the preset classifier by inputting the acoustic features of the each training speech into the preset classifier, to obtain a target classifier;

identifying, by at least one computing device, an input speech of a user to determine whether the input speech is a child speech;

filtrating, by the at least one computing device, a searched result obtained according to the input speech to obtain a filtrated searched result, if the input speech is the child speech; and

feeding, by the at least one computing device, the filtrated searched result to the user,wherein removing, by the at least one computing device, the silent speech from the sample speeches by performing a speech activity detection on the sample speeches, to obtain training speeches comprises;

dividing, by the at least one computing device, each sample speech into frames by a preset first step size, and removing, by the at least one computing device, the silent speech from each frame of the each sample speech by performing the speech activity detection on the each frame of the each sample speech, to obtain the each training speech;

wherein extracting, by the at least one computing device, the acoustic features of each training speech comprises;

dividing, by the at least one computing device, the each training speech by a preset second step size; and

extracting, by the at least one computing device, by a preset third step size, the acoustic features of the each training speech after dividing by the preset second step size.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a device for searching according to a speech based on artificial intelligence are provided. The method includes: identifying an input speech of a user to determine whether the input speech is a child speech; filtrating a searched result obtained according to the input speech to obtain a filtrated searched result, if the input speech is the child speech; and feeding the filtrated searched result back to the user.

Citations

15 Claims

1. A method for searching according to a speech based on artificial intelligence, comprising:
- acquiring, by at least one computing device, sample speeches for training a preset classifier;
  
  removing, by the at least one computing device, a silent speech from the sample speeches by performing a speech activity detection on the sample speeches, to obtain training speeches;
  
  extracting, by the at least one computing device, acoustic features of each training speech; and
  
  training, by the at least one computing device, the preset classifier by inputting the acoustic features of the each training speech into the preset classifier, to obtain a target classifier;
  
  identifying, by at least one computing device, an input speech of a user to determine whether the input speech is a child speech;
  
  filtrating, by the at least one computing device, a searched result obtained according to the input speech to obtain a filtrated searched result, if the input speech is the child speech; and
  
  feeding, by the at least one computing device, the filtrated searched result to the user,wherein removing, by the at least one computing device, the silent speech from the sample speeches by performing a speech activity detection on the sample speeches, to obtain training speeches comprises;
  
  dividing, by the at least one computing device, each sample speech into frames by a preset first step size, and removing, by the at least one computing device, the silent speech from each frame of the each sample speech by performing the speech activity detection on the each frame of the each sample speech, to obtain the each training speech;
  
  wherein extracting, by the at least one computing device, the acoustic features of each training speech comprises;
  
  dividing, by the at least one computing device, the each training speech by a preset second step size; and
  
  extracting, by the at least one computing device, by a preset third step size, the acoustic features of the each training speech after dividing by the preset second step size.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method according to claim 1, wherein filtrating, by the at least one computing device, a searched result obtained according to the input speech comprises:
    - converting, by the at least one computing device, the input speech into a text content;
      
      obtaining, by the at least one computing device, the searched result by searching according to the text content; and
      
      filtrating, by the at least one computing device, the searched result to remove a sensitive content unsuitable for a child.
  - 3. The method according to claim 2, wherein obtaining, by the at least one computing device, the searched result by searching according to the text content comprises:
    - searching, by the at least one computing device, according to the text content in a first database pre-established for children; and
      
      searching, by the at least one computing device, according to the text content in a second database to obtain the searched result, if no content related to the input speech is searched in the first database.
  - 4. The method according to claim 1, wherein identifying, by the at least one computing device, an input speech of a user to determine whether the input speech is a child speech comprises:
    - removing, by the at least one computing device, a silent speech from the input speech by performing the speech activity detection on the input speech, to obtain a tested speech;
      
      extracting, by the at least one computing device, acoustic features of the tested speech; and
      
      identifying, by the at least one computing device, the acoustic features of the tested speech by inputting the acoustic features of the tested speech into the target classifier, to determine whether the input speech is the child speech.
  - 5. The method according to claim 4, whereinremoving, by the at least one computing device, the silent speech from the input speech by performing the speech activity detection on the input speech, to obtain a tested speech comprises:
    - dividing, by the at least one computing device, the input speech into frames by a preset first step size, and removing, by the at least one computing device, the silent speech from each frame of the input speech by performing the speech activity detection on the each frame of the input speech, to obtain the tested speech;
      
      extracting, by the at least one computing device, acoustic features of the tested speech comprises;
      
      dividing, by the at least one computing device, the tested speech by a preset second step size; and
      
      extracting, by the at least one computing device, by a preset third step size, the acoustic features of the tested speech after dividing by the preset second step size;
      
      and, identifying, by the at least one computing device, the acoustic features of the tested speech by inputting the acoustic features of the tested speech into the target classifier, to determine whether the input speech is the child speech comprises;
      
      grading, by the at least one computing device, the acoustic features of the tested speech by inputting the acoustic features of the tested speech into the target classifier;
      
      acquiring, by the at least one computing device, an average value of the tested speech; and
      
      determining, by the at least one computing device, that the input speech is the child speech if the average value is greater than a preset threshold.

6. A device for searching according to a speech based on artificial intelligence, comprising:
- a processor; and
  
  a memory, configured to store instructions executable by the processor, wherein the processor is configured to;
  
  acquire sample speeches for training a preset classifier;
  
  remove a silent speech from the sample speeches by performing a speech activity detection on the sample speeches, to obtain training speeches;
  
  extract acoustic features of each training speech; and
  
  train the preset classifier by inputting the acoustic features of the each training speech into the preset classifier, to obtain a target classifier;
  
  identify an input speech of a user to determine whether the input speech is a child speech;
  
  filtrate a searched result obtained according to the input speech to obtain a filtrated searched result, if the input speech is the child speech; and
  
  feed the filtrated searched result to the user,wherein the processor is configured to remove a silent speech from the sample speeches by performing a speech activity detection on the sample speeches, to obtain training speeches by acts of;
  
  dividing each sample speech into frames by a preset first step size, and removing the silent speech from each frame of the each sample speech by performing the speech activity detection on the each frame of the each sample speech, to obtain the each training speech;
  
  and the processor is configured to extract the acoustic features of each training speech by acts of;
  
  dividing the each training speech by a preset second step size; and
  
  extracting by a preset third step size, the acoustic features of the each training speech after dividing by the preset second step size.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The device according to claim 6, wherein the processor is configured to filtrate the searched result obtained according to the input speech by acts of:
    - converting the input speech into a text content;
      
      obtaining the searched result by searching according to the text content; and
      
      filtrating the searched result to remove a sensitive content unsuitable for a child.
  - 8. The device according to claim 7, wherein the processor is configured to obtain the searched result by searching according to the text content by acts of:
    - searching according to the text content in a first database pre-established for children; and
      
      searching according to the text content in a second database to obtain the searched result, if no content related to the input speech is searched in the first database.
  - 9. The device according to claim 6, wherein the processor is configured to identify an input speech of a user to determine whether the input speech is a child speech by acts of:
    - removing a silent speech from the input speech by performing the speech activity detection on the input speech, to obtain a tested speech;
      
      extracting acoustic features of the tested speech; and
      
      identifying the acoustic features of the tested speech by inputting the acoustic features of the tested speech into the target classifier, to determine whether the input speech is the child speech.
  - 10. The device according to claim 9, wherein the processor is configured to remove the silent speech from the input speech by performing the speech activity detection on the input speech, to obtain a tested speech by acts of:
    - dividing the input speech into frames by a preset first step size, and removing the silent speech from each frame of the input speech by performing the speech activity detection on the each frame of the input speech, to obtain the tested speech;
      
      the processor is configured to extract acoustic features of the tested speech by acts of;
      
      dividing the tested speech by a preset second step size; and
      
      extracting by a preset third step size, the acoustic features of the tested speech after dividing by the preset second step size,and the processor is configured to identify the acoustic features of the tested speech by inputting the acoustic features of the tested speech into the target classifier, to determine whether the input speech is the child speech by acts of;
      
      grading the acoustic features of the tested speech by inputting the acoustic features of the tested speech into the target classifier;
      
      acquiring an average value of the tested speech; and
      
      determining that the input speech is the child speech if the average value is greater than a preset threshold.

11. A non-transitory computer readable storage medium comprising instructions, wherein the instructions are executed by a processor of a device to perform:
- acquiring sample speeches for training a preset classifier;
  
  removing a silent speech from the sample speeches by performing a speech activity detection on the sample speeches, to obtain training speeches;
  
  extracting acoustic features of each training speech; and
  
  training the preset classifier by inputting the acoustic features of the each training speech into the preset classifier, to obtain a target classifier;
  
  identifying an input speech of a user to determine whether the input speech is a child speech;
  
  filtrating a searched result obtained according to the input speech to obtain a filtrated searched result, if the input speech is the child speech; and
  
  feeding the filtrated searched result to the user,wherein removing a silent speech from the sample speeches by performing a speech activity detection on the sample speeches, to obtain training speeches comprises;
  
  dividing each sample speech into frames by a preset first step size, and removing the silent speech from each frame of the each sample speech by performing the speech activity detection on the each frame of the each sample speech, to obtain the each training speech;
  
  wherein extracting the acoustic features of each training speech comprises;
  
  dividing the each training speech by a preset second step size; and
  
  extracting by a preset third step size, the acoustic features of the each training speech after dividing by the preset second step size.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The non-transitory computer readable storage medium according to claim 11, wherein filtrating a searched result obtained according to the input speech comprises:
    - converting the input speech into a text content;
      
      obtaining the searched result by searching according to the text content; and
      
      filtrating the searched result to remove a sensitive content unsuitable for a child.
  - 13. The non-transitory computer readable storage medium according to claim 12, wherein obtaining the searched result by searching according to the text content comprises:
    - searching according to the text content in a first database pre-established for children; and
      
      searching according to the text content in a second database to obtain the searched result, if no content related to the input speech is searched in the first database.
  - 14. The non-transitory computer readable storage medium according to claim 11, wherein identifying an input speech of a user to determine whether the input speech is a child speech comprises:
    - removing a silent speech from the input speech by performing the speech activity detection on the input speech, to obtain a tested speech;
      
      extracting acoustic features of the tested speech; and
      
      identifying the acoustic features of the tested speech by inputting the acoustic features of the tested speech into the target classifier, to determine whether the input speech is the child speech.
  - 15. The non-transitory computer readable storage medium according to claim 14, wherein removing the silent speech from the input speech by performing the speech activity detection on the input speech, to obtain a tested speech comprises:
    - dividing the input speech into frames by a preset first step size, and removing the silent speech from each frame of the input speech by performing the speech activity detection on the each frame of the input speech, to obtain the tested speech;
      
      extracting acoustic features of the tested speech comprises;
      
      dividing, by the at least one computing device, the tested speech by a preset second step size; and
      
      extracting by a preset third step size, the acoustic features of the tested speech after dividing by the preset second step size;
      
      and wherein, identifying the acoustic features of the tested speech by inputting the acoustic features of the tested speech into the target classifier, to determine whether the input speech is the child speech comprises;
      
      grading the acoustic features of the tested speech by inputting the acoustic features of the tested speech into the target classifier;
      
      acquiring an average value of the tested speech; and
      
      determining that the input speech is the child speech if the average value is greater than a preset threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Baidu Online Network Technology (Beijing) Co., Ltd (Baidu Incorporated)
Original Assignee
Baidu Online Network Technology (Beijing) Co., Ltd (Baidu Incorporated)
Inventors
Li, Chao, Li, Xiangang, Sun, Jue
Primary Examiner(s)
YEN, ERIC L

Application Number

US15/823,663
Publication Number

US 20180151183A1
Time in Patent Office

385 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/433   using audio data

G06F 16/436   using biological or physiol...

G10L 15/02   Feature extraction for spee...

G10L 15/26   Speech to text systems G10L...

G10L 17/26   Recognition of special voic...

G10L 2015/223   Execution procedure of a sp...

G10L 2025/786   Adaptive threshold

G10L 25/27   characterised by the analys...

G10L 25/30   using neural networks

G10L 25/93   Discriminating between voic...

Method and device for searching according to speech based on artificial intelligence

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Method and device for searching according to speech based on artificial intelligence

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links