The newly announced 'VoiceCraft' is a neural codec language model inspired by multimodal models of text and images, enabling zero-shot text-to-speech output, speech synthesis, and speech editing.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results