Author
Riki Takizawa1,
Shigeyuki Hirai2
1: Department of Frontier Informatics, Graduate School of Kyoto Sangyo University, Japan.
2: Faculty of Information Science and Engineering ,Kyoto Sangyo University, Japan.
We proposed a method to synthesize sound effects with controlling nuances by representing utterances of onomatopoeia which don't depend on linguistic pronunciations. Figure 1 shows the schematic image of our proposed method, Voice-to-SE. In this method, we utilize Transformer for the conversion of sound effects from utterances.
Mel-spec |
||||
---|---|---|---|---|
Mel-spec |
||||
---|---|---|---|---|
Mel-spec |
||||
---|---|---|---|---|
Mel-spec |
||||
---|---|---|---|---|
Mel-spec |
||||
---|---|---|---|---|
Mel-spec |
||||
---|---|---|---|---|
Mel-spec |
||||
---|---|---|---|---|
Mel-spec |
||||
---|---|---|---|---|
Mel-spec |
||||
---|---|---|---|---|
Mel-spec |
||||
---|---|---|---|---|