-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Description
Contact Information
微信号: SunLionY ,邮箱: yinzidong2003@163.com
MaxKB Version
v1.10.9-lts
Problem Description
AI模型: Xinference - CosyVoice2-0.5B
模块启动参数: 已配置 use_flow_cache=True
问题简述:目标是测试CosyVoice2-0.5B的语音合成流式输出。
根据Xinference的API接口定义:

至少支持以下参数:
- voice: 音色
- speed: 语速
- stream: 流式标识
但在MaxKB配置模型参数时,一旦设置了stream
,则接口报错或调用均会报错:
通过Xinference后台日志可以看到,此时MaxKB压根就没发出API请求。
看了下MaxKB的部分源码,我把stream
改成streaming
,会出现新的报错:
此时,Xinference的日志为:
2025-08-03 00:36:53,679 xinference.core.supervisor 39 DEBUG [request e5b4db50-6fbe-11f0-8376-0690527bddda] Enter get_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7fac8d3c8720>,CosyVoice2-0.5B, kwargs:
2025-08-03 00:36:53,679 xinference.core.worker 39 DEBUG Enter get_model, args: <xinference.core.worker.WorkerActor object at 0x7fac8d3cac50>, kwargs: model_uid=CosyVoice2-0.5B-0
2025-08-03 00:36:53,679 xinference.core.worker 39 DEBUG Leave get_model, elapsed time: 0 s
2025-08-03 00:36:53,679 xinference.core.supervisor 39 DEBUG [request e5b4db50-6fbe-11f0-8376-0690527bddda] Leave get_model, elapsed time: 0 s
2025-08-03 00:36:53,680 xinference.core.model 527 DEBUG Request speech, current serve request count: 0, request limit: inf for the model CosyVoice2-0.5B-0
2025-08-03 00:36:53,681 xinference.core.model 527 DEBUG [request e5b52f74-6fbe-11f0-97ed-0690527bddda] Enter speech, args: ModelActor(CosyVoice2-0.5B-0), kwargs: input=你好!,voice=中文女,response_format=mp3,speed=1.0,stream=False
2025-08-03 00:36:53,682 xinference.model.audio.cosyvoice 527 INFO CosyVoice inference_sft
0%| | 0/1 [00:00<?, ?it/s]
2025-08-03 00:36:53,755 xinference.core.model 527 ERROR [request e5b52f74-6fbe-11f0-97ed-0690527bddda] Leave speech, error: set use_flow_cache=False for nonstream inference, elapsed time: 0 s
Traceback (most recent call last):
File "/opt/inference/xinference/core/utils.py", line 93, in wrapped
ret = await func(*args, **kwargs)
File "/opt/inference/xinference/core/model.py", line 1028, in speech
return await self._call_wrapper_binary(
File "/opt/inference/xinference/core/model.py", line 670, in _call_wrapper_binary
return await self._call_wrapper("binary", fn, *args, **kwargs)
File "/opt/inference/xinference/core/model.py", line 146, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/opt/inference/xinference/core/model.py", line 692, in _call_wrapper
ret = await asyncio.to_thread(fn, *args, **kwargs)
File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/inference/xinference/model/audio/cosyvoice.py", line 202, in speech
return self._speech_handle(
File "/opt/inference/xinference/model/audio/cosyvoice.py", line 151, in _speech_handle
tensor=torch.cat([o["tts_speech"] for o in output], dim=1),
File "/opt/inference/xinference/model/audio/cosyvoice.py", line 151, in <listcomp>
tensor=torch.cat([o["tts_speech"] for o in output], dim=1),
File "/opt/inference/xinference/model/audio/../../thirdparty/cosyvoice/cli/cosyvoice.py", line 85, in inference_sft
for model_output in self.model.tts(**model_input, stream=stream, speed=speed):
File "/opt/inference/xinference/model/audio/../../thirdparty/cosyvoice/cli/model.py", line 392, in tts
assert self.use_flow_cache is False, "set use_flow_cache=False for nonstream inference"
AssertionError: set use_flow_cache=False for nonstream inference
可以看到请求内容为
[request e5b52f74-6fbe-11f0-97ed-0690527bddda] Enter speech, args: ModelActor(CosyVoice2-0.5B-0), kwargs: input=你好!,voice=中文女,response_format=mp3,speed=1.0,stream=False
```
压根stream参数和设置值就不一样。如果把stream或者streaming参数删除,会报上面一样的错误,疑似没有请求stream参数。
### Steps to Reproduce
见问题描述
### The expected correct result
排查并修复Xinference API接口调用
### Related log output
```shell
```
### Additional Information
_No response_
Metadata
Metadata
Assignees
Labels
No labels