TypeError: __call__() got an unexpected keyword argument 'transcript'
Description:
I encountered a 'TypeError' while running the code snippet below. It seems like the 'transcript' argument is not recognized, although it's listed as a valid argument in the documentation. Can someone please help me understand why this error is occurring and how to resolve it?
import scipy
import torch
from diffusers import AudioLDM2Pipeline
repo_id = "anhnct/audioldm2_gigaspeech"
pipe = AudioLDM2Pipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
# define the prompts
prompt = "An female actor say with angry voice"
transcript= "hi, i am yeong min. nice to meet you"
negative_prompt = "low quality"
# set the seed for generator
generator = torch.Generator("cuda").manual_seed(1)
# run the generation
audio = pipe(
prompt,
negative_prompt=negative_prompt,
transcription=transcript,
num_inference_steps=200,
audio_length_in_s=8.0,
num_waveforms_per_prompt=1,
generator=generator,
max_new_tokens=512
).audios
# save the best audio sample (index 0) as a .wav file
scipy.io.wavfile.write("introduce.wav", rate=16000, data=audio[0])
Error Message:
TypeError: __call__() got an unexpected keyword argument 'transcription'
Environment:
- Python version: 3.9
- Operating system: Linux
- Hardware acceleration (if relevant): CUDA version 12.2
hi, what version diffusers do you use ?
I am using 0.27.2 version
>>> import diffusers
>>> diffusers.__version__
'0.27.2'
>>>
Despite downgrading to version '0.21.0' of diffusers, I'm still encountering the same TypeError as before.
I think you need to install Diffusers from source as this feature is not updated yet. Please wait for Diffusers updates to install with pip
Sorry, you told me to install Diffusers from the source, but I didn't understand what this meant
How can I install it from a source? Is there a yaml file or a requirements.txt file?
I was able to solve the problem following your advice. I appreciate it.
Hey, may I know the diffusers version and transformers version you use? I encounter the problem Segmentation fault (core dumped). Thank you!
You should use latest version
Promlem : Segmentation Fault
Desc
My diffusers and transformers package have updated to the latest version but I also encounter the same problem, segmentation fault.
Env
conda env name -> speech
torch == 2.5.1 + cu124
diffussers == 0.32.1
transformers == 4.47.1
accelerate == 1.2.1
phonemizer == 3.3.0
Error Log
The console log of the error is as follows
Fatal Python error: Segmentation fault
Thread 0x00007f7c06ffd700 (most recent call first):
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 324 in wait
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 600 in wait
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 1009 in _bootstrap_inner
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 966 in _bootstrap
Thread 0x00007f7c15b85700 (most recent call first):
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 324 in wait
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 600 in wait
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 1009 in _bootstrap_inner
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/threading.py", line 966 in _bootstrap
Current thread 0x00007f7db6d80740 (most recent call first):
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/backend/espeak/api.py", line 229 in text_to_phonemes
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/backend/espeak/wrapper.py", line 314 in text_to_phonemes
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/backend/espeak/espeak.py", line 91 in _phonemize_aux
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/backend/base.py", line 191 in phonemize
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/phonemize.py", line 310 in _phonemize
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/phonemizer/phonemize.py", line 227 in phonemize
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/models/vits/tokenization_vits.py", line 192 in prepare_for_tokenization
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 640 in tokenize
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 768 in get_input_ids
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 801 in _encode_plus
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3046 in encode_plus
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2970 in _call_one
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2860 in __call__
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/diffusers/pipelines/audioldm2/pipeline_audioldm2.py", line 426 in encode_prompt
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/diffusers/pipelines/audioldm2/pipeline_audioldm2.py", line 968 in __call__
File "/home/masters/xxx/anaconda3/envs/speech/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116 in decorate_context
Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, scipy._lib._ccallback_c, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, PIL._imaging, markupsafe._speedups, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, regex._regex, psutil._psutil_linux, psutil._psutil_posix, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label (total: 114)
/var/tmp/sclTA2r2F: line 8: 24193 Segmentation fault
I guess the problem lies in the version of transformers package and phonemizer package.
Is it possible for you to give me a reference for versions of libraries in the conda env , such as requirements.txt?
Thank you!