[Bug] Xinference TTS API调用报错

### Contact Information

微信号: SunLionY ，邮箱: yinzidong2003@163.com

### MaxKB Version

v1.10.9-lts

### Problem Description

AI模型： Xinference - CosyVoice2-0.5B
模块启动参数： 已配置 use_flow_cache=True

问题简述：目标是测试CosyVoice2-0.5B的语音合成流式输出。

根据Xinference的API接口定义：

<img width="975" height="759" alt="Image" src="https://github.com/user-attachments/assets/84188e5e-dde9-4edc-93e2-636e63d33247" />

至少支持以下参数：
- voice: 音色
- speed: 语速
- stream: 流式标识

但在MaxKB配置模型参数时，一旦设置了`stream`，则接口报错或调用均会报错：
<img width="959" height="705" alt="Image" src="https://github.com/user-attachments/assets/cedb3cb8-00ab-4b52-81bc-f51f4bede190" />
通过Xinference后台日志可以看到，此时MaxKB压根就没发出API请求。

看了下MaxKB的部分源码，我把`stream`改成`streaming`，会出现新的报错：
<img width="1030" height="699" alt="Image" src="https://github.com/user-attachments/assets/deeb124b-cc33-4bcd-a16b-51c35b7838b5" />

此时，Xinference的日志为：
```
2025-08-03 00:36:53,679 xinference.core.supervisor 39 DEBUG    [request e5b4db50-6fbe-11f0-8376-0690527bddda] Enter get_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7fac8d3c8720>,CosyVoice2-0.5B, kwargs: 
2025-08-03 00:36:53,679 xinference.core.worker 39 DEBUG    Enter get_model, args: <xinference.core.worker.WorkerActor object at 0x7fac8d3cac50>, kwargs: model_uid=CosyVoice2-0.5B-0
2025-08-03 00:36:53,679 xinference.core.worker 39 DEBUG    Leave get_model, elapsed time: 0 s
2025-08-03 00:36:53,679 xinference.core.supervisor 39 DEBUG    [request e5b4db50-6fbe-11f0-8376-0690527bddda] Leave get_model, elapsed time: 0 s
2025-08-03 00:36:53,680 xinference.core.model 527 DEBUG    Request speech, current serve request count: 0, request limit: inf for the model CosyVoice2-0.5B-0
2025-08-03 00:36:53,681 xinference.core.model 527 DEBUG    [request e5b52f74-6fbe-11f0-97ed-0690527bddda] Enter speech, args: ModelActor(CosyVoice2-0.5B-0), kwargs: input=你好！,voice=中文女,response_format=mp3,speed=1.0,stream=False
2025-08-03 00:36:53,682 xinference.model.audio.cosyvoice 527 INFO     CosyVoice inference_sft
  0%|          | 0/1 [00:00<?, ?it/s]
2025-08-03 00:36:53,755 xinference.core.model 527 ERROR    [request e5b52f74-6fbe-11f0-97ed-0690527bddda] Leave speech, error: set use_flow_cache=False for nonstream inference, elapsed time: 0 s
Traceback (most recent call last):
  File "/opt/inference/xinference/core/utils.py", line 93, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 1028, in speech
    return await self._call_wrapper_binary(
  File "/opt/inference/xinference/core/model.py", line 670, in _call_wrapper_binary
    return await self._call_wrapper("binary", fn, *args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 146, in _async_wrapper
    return await fn(self, *args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 692, in _call_wrapper
    ret = await asyncio.to_thread(fn, *args, **kwargs)
  File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/inference/xinference/model/audio/cosyvoice.py", line 202, in speech
    return self._speech_handle(
  File "/opt/inference/xinference/model/audio/cosyvoice.py", line 151, in _speech_handle
    tensor=torch.cat([o["tts_speech"] for o in output], dim=1),
  File "/opt/inference/xinference/model/audio/cosyvoice.py", line 151, in <listcomp>
    tensor=torch.cat([o["tts_speech"] for o in output], dim=1),
  File "/opt/inference/xinference/model/audio/../../thirdparty/cosyvoice/cli/cosyvoice.py", line 85, in inference_sft
    for model_output in self.model.tts(**model_input, stream=stream, speed=speed):
  File "/opt/inference/xinference/model/audio/../../thirdparty/cosyvoice/cli/model.py", line 392, in tts
    assert self.use_flow_cache is False, "set use_flow_cache=False for nonstream inference"
AssertionError: set use_flow_cache=False for nonstream inference
```

可以看到请求内容为
````
[request e5b52f74-6fbe-11f0-97ed-0690527bddda] Enter speech, args: ModelActor(CosyVoice2-0.5B-0), kwargs: input=你好！,voice=中文女,response_format=mp3,speed=1.0,stream=False
```
压根stream参数和设置值就不一样。如果把stream或者streaming参数删除，会报上面一样的错误，疑似没有请求stream参数。


### Steps to Reproduce

见问题描述

### The expected correct result

排查并修复Xinference API接口调用

### Related log output

```shell

```

### Additional Information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Xinference TTS API调用报错 #3796

Contact Information

MaxKB Version

Problem Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Xinference TTS API调用报错 #3796

Description

Contact Information

MaxKB Version

Problem Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions