Skip to content

[Bug] Xinference TTS API调用报错 #3796

@ZidongY

Description

@ZidongY

Contact Information

微信号: SunLionY ,邮箱: yinzidong2003@163.com

MaxKB Version

v1.10.9-lts

Problem Description

AI模型: Xinference - CosyVoice2-0.5B
模块启动参数: 已配置 use_flow_cache=True

问题简述:目标是测试CosyVoice2-0.5B的语音合成流式输出。

根据Xinference的API接口定义:

Image

至少支持以下参数:

  • voice: 音色
  • speed: 语速
  • stream: 流式标识

但在MaxKB配置模型参数时,一旦设置了stream,则接口报错或调用均会报错:
Image
通过Xinference后台日志可以看到,此时MaxKB压根就没发出API请求。

看了下MaxKB的部分源码,我把stream改成streaming,会出现新的报错:
Image

此时,Xinference的日志为:

2025-08-03 00:36:53,679 xinference.core.supervisor 39 DEBUG    [request e5b4db50-6fbe-11f0-8376-0690527bddda] Enter get_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7fac8d3c8720>,CosyVoice2-0.5B, kwargs: 
2025-08-03 00:36:53,679 xinference.core.worker 39 DEBUG    Enter get_model, args: <xinference.core.worker.WorkerActor object at 0x7fac8d3cac50>, kwargs: model_uid=CosyVoice2-0.5B-0
2025-08-03 00:36:53,679 xinference.core.worker 39 DEBUG    Leave get_model, elapsed time: 0 s
2025-08-03 00:36:53,679 xinference.core.supervisor 39 DEBUG    [request e5b4db50-6fbe-11f0-8376-0690527bddda] Leave get_model, elapsed time: 0 s
2025-08-03 00:36:53,680 xinference.core.model 527 DEBUG    Request speech, current serve request count: 0, request limit: inf for the model CosyVoice2-0.5B-0
2025-08-03 00:36:53,681 xinference.core.model 527 DEBUG    [request e5b52f74-6fbe-11f0-97ed-0690527bddda] Enter speech, args: ModelActor(CosyVoice2-0.5B-0), kwargs: input=你好!,voice=中文女,response_format=mp3,speed=1.0,stream=False
2025-08-03 00:36:53,682 xinference.model.audio.cosyvoice 527 INFO     CosyVoice inference_sft
  0%|          | 0/1 [00:00<?, ?it/s]
2025-08-03 00:36:53,755 xinference.core.model 527 ERROR    [request e5b52f74-6fbe-11f0-97ed-0690527bddda] Leave speech, error: set use_flow_cache=False for nonstream inference, elapsed time: 0 s
Traceback (most recent call last):
  File "/opt/inference/xinference/core/utils.py", line 93, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 1028, in speech
    return await self._call_wrapper_binary(
  File "/opt/inference/xinference/core/model.py", line 670, in _call_wrapper_binary
    return await self._call_wrapper("binary", fn, *args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 146, in _async_wrapper
    return await fn(self, *args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 692, in _call_wrapper
    ret = await asyncio.to_thread(fn, *args, **kwargs)
  File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/inference/xinference/model/audio/cosyvoice.py", line 202, in speech
    return self._speech_handle(
  File "/opt/inference/xinference/model/audio/cosyvoice.py", line 151, in _speech_handle
    tensor=torch.cat([o["tts_speech"] for o in output], dim=1),
  File "/opt/inference/xinference/model/audio/cosyvoice.py", line 151, in <listcomp>
    tensor=torch.cat([o["tts_speech"] for o in output], dim=1),
  File "/opt/inference/xinference/model/audio/../../thirdparty/cosyvoice/cli/cosyvoice.py", line 85, in inference_sft
    for model_output in self.model.tts(**model_input, stream=stream, speed=speed):
  File "/opt/inference/xinference/model/audio/../../thirdparty/cosyvoice/cli/model.py", line 392, in tts
    assert self.use_flow_cache is False, "set use_flow_cache=False for nonstream inference"
AssertionError: set use_flow_cache=False for nonstream inference

可以看到请求内容为

[request e5b52f74-6fbe-11f0-97ed-0690527bddda] Enter speech, args: ModelActor(CosyVoice2-0.5B-0), kwargs: input=你好!,voice=中文女,response_format=mp3,speed=1.0,stream=False
```
压根stream参数和设置值就不一样。如果把stream或者streaming参数删除,会报上面一样的错误,疑似没有请求stream参数。


### Steps to Reproduce

见问题描述

### The expected correct result

排查并修复Xinference API接口调用

### Related log output

```shell

```

### Additional Information

_No response_

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions