Electronic musical instrument, electronic musical instrument control method, and storage medium

US 10,629,179 B2
Filed: 06/20/2019
Issued: 04/21/2020
Est. Priority Date: 06/21/2018
Status: Active Grant

First Claim

Patent Images

1. An electronic musical instrument comprising:

a plurality of operation elements respectively corresponding to mutually different pitch data;

a memory that stores a trained acoustic model obtained by performing machine learning on training musical score data including training lyric data and training pitch data, and on training singing voice data of a singer corresponding to the training musical score data, the trained acoustic model being configured to receive lyric data and prescribed pitch data and output acoustic feature data of a singing voice of the singer in response to the received lyric data and pitch data; and

at least one processor in which a first mode and a second mode are interchangeably selectable,wherein in the first mode, the at least one processor;

in accordance with a user operation on an operation element in the plurality of operation elements, inputs prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, anddigitally synthesizes and outputs inferred singing voice data that infers a singing voice of the singer on the basis of at least a portion of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, and on the basis of instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element, andwherein in the second mode, the at least one processor;

in accordance with a user operation on an operation element in the plurality of operation elements, inputs prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, anddigitally synthesizes and outputs inferred singing voice data that infers a singing voice of the singer on the basis of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, without using instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An electronic musical instrument includes: a memory that stores a machine-learning trained acoustic model mimicking voice of a singer and at least one processor. When a vocoder mode is on, prescribed lyric data and pitch data corresponding to a user operation of an operation element of the musical instrument are inputted to the trained acoustic model, and inferred singing voice data that infers a singing voice of the singer is synthesized on the basis of acoustic feature data output by the trained acoustic model and on the basis of instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element. When the vocoder mode is off, the inferred singing voice data is synthesized based on the acoustic feature data without using the sound waveform data.

Citations

18 Claims

1. An electronic musical instrument comprising:
- a plurality of operation elements respectively corresponding to mutually different pitch data;
  
  a memory that stores a trained acoustic model obtained by performing machine learning on training musical score data including training lyric data and training pitch data, and on training singing voice data of a singer corresponding to the training musical score data, the trained acoustic model being configured to receive lyric data and prescribed pitch data and output acoustic feature data of a singing voice of the singer in response to the received lyric data and pitch data; and
  
  at least one processor in which a first mode and a second mode are interchangeably selectable,wherein in the first mode, the at least one processor;
  
  in accordance with a user operation on an operation element in the plurality of operation elements, inputs prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, anddigitally synthesizes and outputs inferred singing voice data that infers a singing voice of the singer on the basis of at least a portion of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, and on the basis of instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element, andwherein in the second mode, the at least one processor;
  
  in accordance with a user operation on an operation element in the plurality of operation elements, inputs prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, anddigitally synthesizes and outputs inferred singing voice data that infers a singing voice of the singer on the basis of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, without using instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The electronic musical instrument according to claim 1, wherein the at least one processor switches between the first mode and the second mode based on a user operation of a mode selection operation element provided in the electronic musical instrument.
  - 3. The electronic musical instrument according to claim 1,wherein the memory contains melody pitch data indicating operation elements that a user is to operate, singing voice output timing data indicating output timings at which respective singing voices for pitches indicated by the melody pitch data are to be output, and lyric data respectively corresponding to the melody pitch data, andwherein in the first mode, the at least one processor:
    - when a user operation for producing a singing voice is performed at an output timing indicated by the singing voice output timing data, inputs pitch data corresponding to the user-operated operation element and lyric data corresponding to said output timing to the trained acoustic model, and outputs, at said output timing, inferred singing voice data that infers the singing voice of the singer on the basis of the at least a portion of the acoustic feature data output by the trained acoustic model in response to the input, andwhen a user operation for producing a singing voice is not performed at the output timing indicated by the singing voice output timing data, inputs melody pitch data corresponding to said output timing and lyric data corresponding to said output timing to the trained acoustic model, and outputs, at said output timing, inferred singing voice data that infers the singing voice of the singer on the basis of the at least a portion of the acoustic feature data output by the trained acoustic model in response to the input.
  - 4. The electronic musical instrument according to claim 1,wherein the acoustic feature data of the singing voice of the singer includes spectral data that models a vocal tract of the singer and sound source data that models vocal cords of the singer, andwherein in the second mode, the at least one processor synthesizes the inferred singing voice data that infers the singing voice of the singer on the basis of the spectral data and the sound source data.
  - 5. The electronic musical instrument according to claim 1, further comprising a selection operation element that, from a plurality of instrument sounds including at least one of a brass sound, a string sound, an organ sound, or an animal cry, specifies one of the instrument sounds in response to a user operation, andwherein in the first mode, the instrument sound waveform data corresponds to the instrument sound specified by the selection operation element.
  - 6. The electronic musical instrument according to claim 1,wherein the acoustic feature data of the singing voice of the singer includes spectral data that models a vocal tract of the singer and sound source data that models vocal cords of the singer, andwherein in the first mode, the at least one processor synthesizes the inferred singing voice data that infers the singing voice of the singer on the basis of the sound source data by applying an acoustic feature of the spectral data to the instrument sound waveform data without using the sound source data of the acoustic feature data.
  - 7. The electronic musical instrument according to claim 1, wherein the trained acoustic model has been trained via machine learning using at least one of a deep neural network or a hidden Markov model.
  - 8. The electronic musical instrument according to claim 1,wherein the plurality of operation elements include a first operation element as the operation element that was operated by the user and a second operation element that meets a prescribed condition with respect to the first operation element, andwherein in both of the first and second modes, the at least one processor applies an acoustic effect to the inferred singing voice data when the second operation element is operated while the first operation element is being operated.
  - 9. The electronic musical instrument according to claim 8, wherein the at least one processor changes a depth of the acoustic effect in accordance with a difference in pitch between a pitch corresponding to the first operation element and a pitch corresponding to the second operation element.
  - 10. The electronic musical instrument according to claim 8, wherein the second operation element is a black key.
  - 11. The electronic musical instrument according to claim 8, wherein the acoustic effect includes at least one of a vibrato effect, a tremolo effect, or a wah-wah effect.

12. A method performed by at least one processor in an electronic musical instrument that includes, in addition to the at least one processor:
- a plurality of operation elements respectively corresponding to mutually different pitch data; and
  
  a memory that stores a trained acoustic model obtained by performing machine learning on training musical score data including training lyric data and training pitch data, and on training singing voice data of a singer corresponding to the training musical score data, the trained acoustic model being configured to receive lyric data and prescribed pitch data and output acoustic feature data of a singing voice of the singer in response to the received lyric data and pitch data, a first mode and a second mode being interchangeably selectable in the at least one processor, the method comprising, via the at least one processor;
  
  in the first mode;
  
  in accordance with a user operation on an operation element in the plurality of operation elements, inputting prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, anddigitally synthesizing and outputting inferred singing voice data that infers a singing voice of the singer on the basis of at least a portion of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, and on the basis of instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element, andin the second mode;
  
  in accordance with a user operation on an operation element in the plurality of operation elements, inputting prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, anddigitally synthesizing and outputting inferred singing voice data that infers a singing voice of the singer on the basis of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, without using instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The method according to claim 12, wherein the method includes, via the at least one processor, switching between the first mode and the second mode based on a user operation of a mode selection operation element provided in the electronic musical instrument.
  - 14. The method according to claim 12,wherein the memory contains melody pitch data indicating operation elements that a user is to operate, singing voice output timing data indicating output timings at which respective singing voices for pitches indicated by the melody pitch data are to be output, and lyric data respectively corresponding to the melody pitch data, andwherein in the first mode, the method include, via the at least one processor:
    - when a user operation for producing a singing voice is performed at an output timing indicated by the singing voice output timing data, inputting pitch data corresponding to the user-operated operation element and lyric data corresponding to said output timing to the trained acoustic model, and outputting, at said output timing, inferred singing voice data that infers the singing voice of the singer on the basis of the at least a portion of the acoustic feature data output by the trained acoustic model in response to the input, andwhen a user operation for producing a singing voice is not performed at the output timing indicated by the singing voice output timing data, inputting melody pitch data corresponding to said output timing and lyric data corresponding to said output timing to the trained acoustic model, and outputting, at said output timing, inferred singing voice data that infers the singing voice of the singer on the basis of the at least a portion of the acoustic feature data output by the trained acoustic model in response to the input.
  - 15. The method according to claim 12,wherein the acoustic feature data of the singing voice of the singer includes spectral data that models a vocal tract of the singer and sound source data that models vocal cords of the singer, andwherein the method includes, in the second mode, causing the at least one processor to synthesize the inferred singing voice data that infers the singing voice of the singer on the basis of the spectral data and the sound source data.
  - 16. The method according to claim 12,wherein the electronic musical instrument further includes a selection operation element that, from a plurality of instrument sounds including at least one of a brass sound, a string sound, an organ sound, or an animal cry, specifies one of the instrument sounds in response to a user operation, andwherein in the first mode, the instrument sound waveform data corresponds to the instrument sound specified by the selection operation element.
  - 17. The method according to claim 12,wherein the acoustic feature data of the singing voice of the singer includes spectral data that models a vocal tract of the singer and sound source data that models vocal cords of the singer, andwherein in the first mode, the inferred singing voice data that infers the singing voice of the singer is synthesized on the basis of the sound source data by applying an acoustic feature of the spectral data to the instrument sound waveform data without using the sound source data of the acoustic feature data.

18. A non-transitory computer-readable storage medium having stored thereon a program executable by at least one processor in an electronic musical instrument that includes, in addition to the at least one processor:
- a plurality of operation elements respectively corresponding to mutually different pitch data; and
  
  a memory that stores a trained acoustic model obtained by performing machine learning on training musical score data including training lyric data and training pitch data, and on training singing voice data of a singer corresponding to the training musical score data, the trained acoustic model being configured to receive lyric data and prescribed pitch data and output acoustic feature data of a singing voice of the singer in response to the received lyric data and pitch data, a first mode and a second mode being interchangeably selectable in the at least one processor, the program causing the at least one processor to perform the following;
  
  in the first mode;
  
  in accordance with a user operation on an operation element in the plurality of operation elements, inputting prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, anddigitally synthesizing and outputting inferred singing voice data that infers a singing voice of the singer on the basis of at least a portion of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, and on the basis of instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element, andin the second mode;
  
  in accordance with a user operation on an operation element in the plurality of operation elements, inputting prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, anddigitally synthesizing and outputting inferred singing voice data that infers a singing voice of the singer on the basis of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, without using instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Casio Computer Company Limited
Original Assignee
Casio Computer Company Limited
Inventors
Danjyo, Makoto, Ota, Fumiaki, Setoguchi, Masaru, Nakamura, Atsushi
Primary Examiner(s)
Donels, Jeffrey

Application Number

US16/447,630
Publication Number

US 20190392807A1
Time in Patent Office

306 Days
Field of Search

84602, 84610
US Class Current
CPC Class Codes

G10H 1/0008   Associated control or indic...

G10H 1/125   using a digital filter

G10H 1/36   Accompaniment arrangements

G10H 1/366   with means for modifying or...

G10H 2210/005   Musical accompaniment, i.e....

G10H 2210/091   for performance evaluation,...

G10H 2210/191   Tremolo, tremulando, trill ...

G10H 2210/201   Vibrato, i.e. rapid, repeti...

G10H 2210/231   Wah-wah spectral modulation...

G10H 2220/011   Lyrics displays, e.g. for k...

G10H 2250/015   Markov chains, e.g. hidden ...

G10H 2250/311   Neural networks for electro...

G10H 2250/455   Gensound singing voices, i....

G10H 2250/625   Interwave interpolation, i....

G10H 7/008   Means for controlling the t...

G10L 13/033   Voice editing, e.g. manipul...

Electronic musical instrument, electronic musical instrument control method, and storage medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Electronic musical instrument, electronic musical instrument control method, and storage medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links