USCF Demo

Universal Speech Content Factorization

Human Language Technology Center of Excellence
and
Center for Language and Speech Processing
Johns Hopkins University
Baltimore, Maryland
xli257@jhu.edu

Abstract

We propose Universal Speech Content Factorization (USCF), a simple and invertible linear method for extracting a low-rank speech representation in which speaker timbre is suppressed while phonetic content is preserved. USCF extends Speech Content Factorization (SCF), a closed-set voice conversion method, to an open-set setting by learning a universal speech-to-content mapping via least-squares optimization and deriving speaker-specific transformations from only a few seconds of target speech. We show through embedding analysis that USCF effectively removes speaker-dependent variation. As a zero-shot voice conversion system, USCF achieves competitive intelligibility, naturalness, and speaker similarity compared to methods that require substantially more target-speaker data or additional neural training. Finally, we demonstrate that USCF features can serve as an alternative acoustic representation for text-to-speech, offering a linear, training-efficient substitute for timbre-prompted SSL-based systems.

Voice Conversion

Input Sentence

Target Speaker Sample

USCF

Baselines

TTS

ALSO A POPULAR
CONTRIVANCE WHEREBY
LOVE MAKING MAY BE
SUSPENDED BUT
NOT STOPPED DURING
THE PICNIC SEASON

SCF (closed set)
USCF - 10s target speech
USCF - 100s target speech

kNN-VC
LinearVC
SeedVC

O LIFE OF THIS
OUR SPRING

SCF (closed set)
USCF - 10s target speech
USCF - 100s target speech

kNN-VC
LinearVC
SeedVC

HE COULD WAIT
NO LONGER

SCF (closed set)
USCF - 10s target speech
USCF - 100s target speech

kNN-VC
LinearVC
SeedVC

AT THAT EPOCH
OF PRISTINE
SIMPLICITY
HOWEVER
MATTERS OF
EVEN SLIGHTER
PUBLIC INTEREST
AND OF FAR LESS
INTRINSIC WEIGHT
THAN THE WELFARE
OF HESTER AND
HER CHILD WERE
STRANGELY MIXED
UP WITH THE
DELIBERATIONS OF
LEGISLATORS AND
ACTS OF STATE

SCF (closed set)
USCF - 10s target speech
USCF - 100s target speech

kNN-VC
LinearVC
SeedVC

IN HIS RETURN
TO THE CAMP
HIS ACUTE AND
PRACTISED
INTELLECTS WERE
INTENTLY ENGAGED
IN DEVISING
MEANS TO
COUNTERACT A
WATCHFULNESS AND
SUSPICION ON THE
PART OF HIS
ENEMIES THAT HE
KNEW WERE IN NO
DEGREE INFERIOR
TO HIS OWN

SCF (closed set)
USCF - 10s target speech
USCF - 100s target speech

kNN-VC
LinearVC
SeedVC

I HAD ALWAYS
KNOWN HIM TO
BE RESTLESS IN
HIS MANNER BUT
ON THIS
PARTICULAR
OCCASION HE WAS
IN SUCH A STATE
OF UNCONTROLLABLE
AGITATION THAT IT
WAS CLEAR
SOMETHING VERY
UNUSUAL HAD
OCCURRED

SCF (closed set)
USCF - 10s target speech
USCF - 100s target speech

kNN-VC
LinearVC
SeedVC

IT IS A VERY
FINE OLD PLACE
OF RED BRICK
SOFTENED BY A
PALE POWDERY
LICHEN WHICH
HAS DISPERSED
ITSELF WITH
HAPPY
IRREGULARITY SO
AS TO BRING THE
RED BRICK INTO
TERMS OF
FRIENDLY
COMPANIONSHIP
WITH THE
LIMESTONE
ORNAMENTS
SURROUNDING THE
THREE GABLES
THE WINDOWS AND
THE DOOR PLACE

SCF (closed set)
USCF - 10s target speech
USCF - 100s target speech

kNN-VC
LinearVC
SeedVC

THE PARIS PLANT
LIKE THAT AT
THE CRYSTAL
PALACE WAS A
TEMPORARY
EXHIBIT

SCF (closed set)
USCF - 10s target speech
USCF - 100s target speech

kNN-VC
LinearVC
SeedVC

IN EVERY WAY
THEY SOUGHT
TO UNDERMINE
THE AUTHORITY
OF SAINT PAUL

SCF (closed set)
USCF - 10s target speech
USCF - 100s target speech

kNN-VC
LinearVC
SeedVC

IN THE SUPPOSED
DEPTHS OF THIS
DIALOGUE THE NEO
PLATONISTS FOUND
HIDDEN MEANINGS
AND CONNECTIONS
WITH THE JEWISH
AND CHRISTIAN
SCRIPTURES AND
OUT OF THEM THEY
ELICITED DOCTRINES
QUITE AT VARIANCE
WITH THE SPIRIT OF
PLATO

SCF (closed set)
USCF - 10s target speech
USCF - 100s target speech

kNN-VC
LinearVC
SeedVC