Electronic musical instrument, electronic musical instrument control method, and storage medium
First Claim
1. An electronic musical instrument comprising:
- a plurality of operation elements respectively corresponding to mutually different pitch data;
a memory that stores a trained acoustic model obtained by performing machine learning on training musical score data including training lyric data and training pitch data, and on training singing voice data of a singer corresponding to the training musical score data, the trained acoustic model being configured to receive lyric data and prescribed pitch data and output acoustic feature data of a singing voice of the singer in response to the received lyric data and pitch data; and
at least one processor in which a first mode and a second mode are interchangeably selectable,wherein in the first mode, the at least one processor;
in accordance with a user operation on an operation element in the plurality of operation elements, inputs prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, anddigitally synthesizes and outputs inferred singing voice data that infers a singing voice of the singer on the basis of at least a portion of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, and on the basis of instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element, andwherein in the second mode, the at least one processor;
in accordance with a user operation on an operation element in the plurality of operation elements, inputs prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, anddigitally synthesizes and outputs inferred singing voice data that infers a singing voice of the singer on the basis of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, without using instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element.
1 Assignment
0 Petitions
Accused Products
Abstract
An electronic musical instrument includes: a memory that stores a machine-learning trained acoustic model mimicking voice of a singer and at least one processor. When a vocoder mode is on, prescribed lyric data and pitch data corresponding to a user operation of an operation element of the musical instrument are inputted to the trained acoustic model, and inferred singing voice data that infers a singing voice of the singer is synthesized on the basis of acoustic feature data output by the trained acoustic model and on the basis of instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element. When the vocoder mode is off, the inferred singing voice data is synthesized based on the acoustic feature data without using the sound waveform data.
-
Citations
18 Claims
-
1. An electronic musical instrument comprising:
-
a plurality of operation elements respectively corresponding to mutually different pitch data; a memory that stores a trained acoustic model obtained by performing machine learning on training musical score data including training lyric data and training pitch data, and on training singing voice data of a singer corresponding to the training musical score data, the trained acoustic model being configured to receive lyric data and prescribed pitch data and output acoustic feature data of a singing voice of the singer in response to the received lyric data and pitch data; and at least one processor in which a first mode and a second mode are interchangeably selectable, wherein in the first mode, the at least one processor; in accordance with a user operation on an operation element in the plurality of operation elements, inputs prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, and digitally synthesizes and outputs inferred singing voice data that infers a singing voice of the singer on the basis of at least a portion of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, and on the basis of instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element, and wherein in the second mode, the at least one processor; in accordance with a user operation on an operation element in the plurality of operation elements, inputs prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, and digitally synthesizes and outputs inferred singing voice data that infers a singing voice of the singer on the basis of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, without using instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method performed by at least one processor in an electronic musical instrument that includes, in addition to the at least one processor:
- a plurality of operation elements respectively corresponding to mutually different pitch data; and
a memory that stores a trained acoustic model obtained by performing machine learning on training musical score data including training lyric data and training pitch data, and on training singing voice data of a singer corresponding to the training musical score data, the trained acoustic model being configured to receive lyric data and prescribed pitch data and output acoustic feature data of a singing voice of the singer in response to the received lyric data and pitch data, a first mode and a second mode being interchangeably selectable in the at least one processor, the method comprising, via the at least one processor;in the first mode; in accordance with a user operation on an operation element in the plurality of operation elements, inputting prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, and digitally synthesizing and outputting inferred singing voice data that infers a singing voice of the singer on the basis of at least a portion of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, and on the basis of instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element, and in the second mode; in accordance with a user operation on an operation element in the plurality of operation elements, inputting prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, and digitally synthesizing and outputting inferred singing voice data that infers a singing voice of the singer on the basis of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, without using instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element. - View Dependent Claims (13, 14, 15, 16, 17)
- a plurality of operation elements respectively corresponding to mutually different pitch data; and
-
18. A non-transitory computer-readable storage medium having stored thereon a program executable by at least one processor in an electronic musical instrument that includes, in addition to the at least one processor:
- a plurality of operation elements respectively corresponding to mutually different pitch data; and
a memory that stores a trained acoustic model obtained by performing machine learning on training musical score data including training lyric data and training pitch data, and on training singing voice data of a singer corresponding to the training musical score data, the trained acoustic model being configured to receive lyric data and prescribed pitch data and output acoustic feature data of a singing voice of the singer in response to the received lyric data and pitch data, a first mode and a second mode being interchangeably selectable in the at least one processor, the program causing the at least one processor to perform the following;in the first mode; in accordance with a user operation on an operation element in the plurality of operation elements, inputting prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, and digitally synthesizing and outputting inferred singing voice data that infers a singing voice of the singer on the basis of at least a portion of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, and on the basis of instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element, and in the second mode; in accordance with a user operation on an operation element in the plurality of operation elements, inputting prescribed lyric data and pitch data corresponding to the user operation of the operation element to the trained acoustic model so as to cause the trained acoustic model to output the acoustic feature data in response to the inputted prescribed lyric data and the inputted pitch data, and digitally synthesizing and outputting inferred singing voice data that infers a singing voice of the singer on the basis of the acoustic feature data output by the trained acoustic model in response to the inputted prescribed lyric data and the inputted pitch data, without using instrument sound waveform data that are synthesized in accordance with the pitch data corresponding to the user operation of the operation element.
- a plurality of operation elements respectively corresponding to mutually different pitch data; and
Specification