Diffusers
AudioLDM2Pipeline

TypeError: __call__() got an unexpected keyword argument 'transcript'

#1
by cvipym - opened

Description:
I encountered a 'TypeError' while running the code snippet below. It seems like the 'transcript' argument is not recognized, although it's listed as a valid argument in the documentation. Can someone please help me understand why this error is occurring and how to resolve it?

import scipy
import torch
from diffusers import AudioLDM2Pipeline

repo_id = "anhnct/audioldm2_gigaspeech"
pipe = AudioLDM2Pipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

# define the prompts
prompt = "An female actor say with angry voice"
transcript= "hi, i am yeong min. nice to meet you"
negative_prompt = "low quality"

# set the seed for generator
generator = torch.Generator("cuda").manual_seed(1)

# run the generation
audio = pipe(
    prompt,
    negative_prompt=negative_prompt,
    transcription=transcript,
    num_inference_steps=200,
    audio_length_in_s=8.0,
    num_waveforms_per_prompt=1,
    generator=generator,
    max_new_tokens=512
).audios

# save the best audio sample (index 0) as a .wav file
scipy.io.wavfile.write("introduce.wav", rate=16000, data=audio[0])

Error Message:

TypeError: __call__() got an unexpected keyword argument 'transcription'

Environment:

  • Python version: 3.9
  • Operating system: Linux
  • Hardware acceleration (if relevant): CUDA version 12.2

hi, what version diffusers do you use ?

I am using 0.27.2 version

>>> import diffusers
>>> diffusers.__version__
'0.27.2'
>>> 

Despite downgrading to version '0.21.0' of diffusers, I'm still encountering the same TypeError as before.

I think you need to install Diffusers from source as this feature is not updated yet. Please wait for Diffusers updates to install with pip

Sorry, you told me to install Diffusers from the source, but I didn't understand what this meant
How can I install it from a source? Is there a yaml file or a requirements.txt file?

image.png
Follow this

I was able to solve the problem following your advice. I appreciate it.

cvipym changed discussion status to closed

Hey, may I know the diffusers version and transformers version you use? I encounter the problem Segmentation fault (core dumped). Thank you!

You should use latest version

Promlem : Segmentation Fault

Desc

My diffusers and transformers package have updated to the latest version but I also encounter the same problem, segmentation fault.

Env

conda env name -> speech
torch   == 2.5.1  + cu124
diffussers ==  0.32.1
transformers  == 4.47.1
accelerate  == 1.2.1
phonemizer  == 3.3.0 

Error Log

The console log of the error is as follows


Fatal Python error: Segmentation fault

Thread 0x00007f7c06ffd700 (most recent call first):
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 324 in wait
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 600 in wait
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 1009 in _bootstrap_inner
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 966 in _bootstrap

Thread 0x00007f7c15b85700 (most recent call first):
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 324 in wait
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 600 in wait
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 1009 in _bootstrap_inner
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 966 in _bootstrap

Current thread 0x00007f7db6d80740 (most recent call first):
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/backend/espeak/api.py", line 229 in text_to_phonemes
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/backend/espeak/wrapper.py", line 314 in text_to_phonemes
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/backend/espeak/espeak.py", line 91 in _phonemize_aux
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/backend/base.py", line 191 in phonemize
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/phonemize.py", line 310 in _phonemize
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/phonemize.py", line 227 in phonemize
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/models/vits/tokenization_vits.py", line 192 in prepare_for_tokenization
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 640 in tokenize
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 768 in get_input_ids
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 801 in _encode_plus
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3046 in encode_plus
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2970 in _call_one
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2860 in __call__
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/diffusers/pipelines/audioldm2/pipeline_audioldm2.py", line 426 in encode_prompt
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/diffusers/pipelines/audioldm2/pipeline_audioldm2.py", line 968 in __call__
  File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116 in decorate_context

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, scipy._lib._ccallback_c, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, PIL._imaging, markupsafe._speedups, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, regex._regex, psutil._psutil_linux, psutil._psutil_posix, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label (total: 114)
/var/tmp/sclTA2r2F: line 8: 24193 Segmentation fault 

I guess the problem lies in the version of transformers package and phonemizer package.
Is it possible for you to give me a reference for versions of libraries in the conda env , such as requirements.txt?
Thank you!

Sign up or log in to comment