Yougu Speech · Developer Documentation

优谷雅言 开发者文档

中英文双语语音评测 · 兼容声通 API 协议 · v1

概述

优谷雅言对用户朗读音频做多维度多层级评分,产出结构化评分、自然语言报告与标准示范音频。基址 https://open.shengzhiai.com,所有接口返回 JSON。

每次评测产出四项:① 结构化评分 result(声通兼容字段命名)② AI 综合测评报告 report ③ 识别文本 asrText(含漏/错/增读标注)④ 标准读音 standardAudio。逐字段含义见 返回值说明

鉴权

三种方式,按场景选择:

方式用法适用
API Key(推荐)请求头 X-App-Key: {appKey}(或查询参数 ?api_key={appKey})服务端直连评测/TTS/报告接口;appKey 在控制台「API Key」页创建
Bearer JWTAuthorization: Bearer {token},token 由 POST /api/v1/auth/login 颁发控制台/浏览器登录态调用(在线评测页即此方式)
声通 sig 签名请求头 X-App-Key / X-Timestamp(秒级,±300s 内)/ X-Nonce(可选,防重放)/ X-Signature。签名:取业务参数(query/form,丢空值)按 key 升序拼成 k1=v1&k2=v2,X-Signature = Base64( HMAC-SHA256( payload, secretKey ) )。WS 握手时凭证改走 query(?appKey=&timestamp=&signature=&nonce=)。声通协议兼容层(见 coreType 参考)与声通兼容 WS,存量声通接入零改造迁移
注意:Authorization: Bearer 头只接受登录颁发的 JWT;appKey 直接放 Bearer 会返回 {"code":2002,"message":"token无效"},appKey 请用 X-App-Key 头。

快速开始

四步完成第一次评测:

  1. 获取密钥:在开放平台控制台创建 API Key,得到 appKey(请求头 X-App-Key)与 secretKey。
  2. 准备音频:wav / mp3 / ogg,推荐 16kHz、16bit、单声道 wav(≤10MB、≤5 分钟);其他采样率引擎自动重采样,过低采样率会损失评分精度。
  3. 调用评测:multipart 表单 POST /api/v1/evaluate,字段 audio(文件)+ config(JSON 字符串,该分片的 Content-Type 必须为 application/json)。
  4. 解析结果:取 result.overall 总分与各维分;保存 recordId 以便事后 GET /api/v1/report/{recordId} 回查。
试用:下方示例中的 test_app_key试用密钥(共享、随时可能轮换,生产请在控制台创建你自己的 appKey),示例可直接复制运行;免代码体验请直接打开 /eval 在线评测

提交一段中文朗读音频做句子评测:

curl -X POST https://open.shengzhiai.com/api/v1/evaluate \
  -H "X-App-Key: test_app_key" \
  -F "audio=@reading.wav;type=audio/wav" \
  -F 'config={"coreType":"sentence","referenceText":"鹅,鹅,鹅,曲项向天歌。","language":"zh-CN"};type=application/json'
import requests

cfg = '{"coreType":"sentence","referenceText":"鹅,鹅,鹅,曲项向天歌。","language":"zh-CN"}'
r = requests.post("https://open.shengzhiai.com/api/v1/evaluate",
    headers={"X-App-Key": "test_app_key"},
    files={"audio": ("reading.wav", open("reading.wav", "rb"), "audio/wav"),
           "config": (None, cfg, "application/json")})   # config 分片须为 application/json
r.raise_for_status()
print(r.json()["result"]["overall"])
// OkHttp(com.squareup.okhttp3:okhttp)
OkHttpClient client = new OkHttpClient();
String cfg = "{\"coreType\":\"sentence\",\"referenceText\":\"鹅,鹅,鹅,曲项向天歌。\",\"language\":\"zh-CN\"}";
RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
    .addFormDataPart("audio", "reading.wav",
        RequestBody.create(new File("reading.wav"), MediaType.parse("audio/wav")))
    .addFormDataPart("config", null,
        RequestBody.create(cfg, MediaType.parse("application/json")))
    .build();
Request req = new Request.Builder()
    .url("https://open.shengzhiai.com/api/v1/evaluate")
    .header("X-App-Key", "test_app_key")
    .post(body).build();
try (Response resp = client.newCall(req).execute()) {
    System.out.println(resp.body().string());
}
const fd = new FormData();
fd.append("audio", fileBlob, "reading.wav");
fd.append("config", new Blob([JSON.stringify(
  {coreType:"sentence", referenceText:"鹅,鹅,鹅,曲项向天歌。", language:"zh-CN"}
)], {type:"application/json"}));                       // config 分片须为 application/json
const r = await fetch("https://open.shengzhiai.com/api/v1/evaluate",
  {method:"POST", headers:{"X-App-Key":"test_app_key"}, body:fd});
console.log((await r.json()).result.overall);
package main

import (
    "bytes"
    "fmt"
    "io"
    "mime/multipart"
    "net/http"
    "net/textproto"
    "os"
)

func main() {
    buf := &bytes.Buffer{}
    w := multipart.NewWriter(buf)
    fw, _ := w.CreateFormFile("audio", "reading.wav")
    f, _ := os.Open("reading.wav")
    defer f.Close()
    io.Copy(fw, f)
    h := textproto.MIMEHeader{}                      // config 分片须为 application/json
    h.Set("Content-Disposition", `form-data; name="config"`)
    h.Set("Content-Type", "application/json")
    cw, _ := w.CreatePart(h)
    cw.Write([]byte(`{"coreType":"sentence","referenceText":"鹅,鹅,鹅,曲项向天歌。","language":"zh-CN"}`))
    w.Close()

    req, _ := http.NewRequest("POST",
        "https://open.shengzhiai.com/api/v1/evaluate", buf)
    req.Header.Set("X-App-Key", "test_app_key")
    req.Header.Set("Content-Type", w.FormDataContentType())
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    body, _ := io.ReadAll(resp.Body)
    fmt.Println(string(body))
}
/* gcc evaluate.c -o evaluate -lcurl  (libcurl ≥ 7.56,curl_mime API) */
#include <stdio.h>
#include <curl/curl.h>

int main(void) {
    curl_global_init(CURL_GLOBAL_ALL);
    CURL *h = curl_easy_init();
    if (!h) return 1;

    curl_mime *form = curl_mime_init(h);
    curl_mimepart *p = curl_mime_addpart(form);          /* 音频文件分片 */
    curl_mime_name(p, "audio");
    curl_mime_filedata(p, "reading.wav");
    curl_mime_type(p, "audio/wav");
    p = curl_mime_addpart(form);                          /* config JSON 分片 */
    curl_mime_name(p, "config");
    curl_mime_data(p,
        "{\"coreType\":\"sentence\",\"referenceText\":\"鹅,鹅,鹅,曲项向天歌。\",\"language\":\"zh-CN\"}",
        CURL_ZERO_TERMINATED);
    curl_mime_type(p, "application/json");                /* 必须:否则报 50000 */

    struct curl_slist *hdr =
        curl_slist_append(NULL, "X-App-Key: test_app_key");
    curl_easy_setopt(h, CURLOPT_URL,
        "https://open.shengzhiai.com/api/v1/evaluate");
    curl_easy_setopt(h, CURLOPT_HTTPHEADER, hdr);
    curl_easy_setopt(h, CURLOPT_MIMEPOST, form);

    CURLcode rc = curl_easy_perform(h);   /* 响应 JSON 默认写到 stdout */
    if (rc != CURLE_OK)
        fprintf(stderr, "error: %s\n", curl_easy_strerror(rc));

    curl_slist_free_all(hdr);
    curl_mime_free(form);
    curl_easy_cleanup(h);
    curl_global_cleanup();
    return (int)rc;
}
<?php
// PHP ≥ 8.1(CURLStringFile 用于带 Content-Type 的字符串分片)
$cfg = '{"coreType":"sentence","referenceText":"鹅,鹅,鹅,曲项向天歌。","language":"zh-CN"}';
$ch = curl_init("https://open.shengzhiai.com/api/v1/evaluate");
curl_setopt_array($ch, [
    CURLOPT_POST           => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HTTPHEADER     => ["X-App-Key: test_app_key"],
    CURLOPT_POSTFIELDS     => [   // 数组形式即 multipart/form-data
        "audio"  => new CURLFile("reading.wav", "audio/wav", "reading.wav"),
        "config" => new CURLStringFile($cfg, "config", "application/json"),
    ],
]);
$raw = curl_exec($ch);
if ($raw === false) { die("curl error: " . curl_error($ch)); }
curl_close($ch);
$res = json_decode($raw, true);
echo $res["result"]["overall"], PHP_EOL;

朗读评测 POST/api/v1/evaluate

中英文朗读多维评分(完整度/准确度/声调/流利度/朗读技巧/情感),返回全文/句/字/音素多层级。multipart 表单:audio(文件)+ config(JSON,Content-Type 须为 application/json)。

config 核心字段

字段说明
coreType必填:word / sentence / passage / alpha / connected / open,见 coreType 参考
referenceText必填:参考文本(开放题时为题干),≤1000 字符
languagezh-CN / en-US / en-GB(缺省 en-US)
slack / scale / precision松紧度 / 量程 / 精度
toneWeight声调占总分比例(中文,默认 0.2)

全部可选参数(refPinyin / agegroup / phonemeOutput / includeReport / includeStandardAudio / includeAsrText / taskType 等)见 评测参数

多语言示例

curl -X POST https://open.shengzhiai.com/api/v1/evaluate \
  -H "X-App-Key: test_app_key" \
  -F "audio=@reading.wav;type=audio/wav" \
  -F 'config={"coreType":"sentence","referenceText":"鹅,鹅,鹅,曲项向天歌。","language":"zh-CN","includeReport":true,"includeStandardAudio":true,"includeAsrText":true};type=application/json'
import requests

cfg = {"coreType": "sentence",
       "referenceText": "鹅,鹅,鹅,曲项向天歌。",
       "language": "zh-CN", "includeReport": True}
r = requests.post("https://open.shengzhiai.com/api/v1/evaluate",
    headers={"X-App-Key": "test_app_key"},
    files={"audio": ("reading.wav", open("reading.wav", "rb"), "audio/wav"),
           "config": (None, __import__("json").dumps(cfg, ensure_ascii=False), "application/json")})
d = r.json()
print(d["recordId"], d["result"]["overall"], d["result"]["tone"])
print(d["report"]["summary"])
// OkHttp:multipart audio + config(application/json)
String cfg = "{\"coreType\":\"sentence\",\"referenceText\":\"鹅,鹅,鹅,曲项向天歌。\",\"language\":\"zh-CN\"}";
RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
    .addFormDataPart("audio", "reading.wav",
        RequestBody.create(new File("reading.wav"), MediaType.parse("audio/wav")))
    .addFormDataPart("config", null,
        RequestBody.create(cfg, MediaType.parse("application/json")))
    .build();
Request req = new Request.Builder()
    .url("https://open.shengzhiai.com/api/v1/evaluate")
    .header("X-App-Key", "test_app_key").post(body).build();
try (Response resp = new OkHttpClient().newCall(req).execute()) {
    System.out.println(resp.body().string());
}
// 见「快速开始」Go 完整示例;要点:config 分片用 CreatePart 显式
// 设 Content-Type: application/json,音频用 CreateFormFile。
h := textproto.MIMEHeader{}
h.Set("Content-Disposition", `form-data; name="config"`)
h.Set("Content-Type", "application/json")
cw, _ := w.CreatePart(h)
cw.Write([]byte(`{"coreType":"sentence","referenceText":"鹅,鹅,鹅,曲项向天歌。","language":"zh-CN"}`))
// Node 20+:内置 fetch / FormData / Blob
import { readFile } from "node:fs/promises";

const fd = new FormData();
fd.append("audio", new Blob([await readFile("reading.wav")], {type:"audio/wav"}), "reading.wav");
fd.append("config", new Blob([JSON.stringify({
  coreType: "sentence",
  referenceText: "鹅,鹅,鹅,曲项向天歌。",
  language: "zh-CN"
})], {type:"application/json"}));
const r = await fetch("https://open.shengzhiai.com/api/v1/evaluate",
  {method:"POST", headers:{"X-App-Key":"test_app_key"}, body:fd});
const d = await r.json();
console.log(d.recordId, d.result.overall);

响应(节选,真实调用 recordId=eval_9aa80c616edb)

{
  "recordId": "eval_9aa80c616edb", "eof": 1,
  "result": {
    "overall": 67, "pronunciation": 69, "tone": 100, "fluency": 71,
    "rhythm": 67, "integrity": 68, "speed": 77, "rear_tone": "rise",
    "duration": "7.320", "warning": [],
    "words": [{"word":"鹅","pinyin":"e","rawpinyin":"e2","symbolpinyin":"é",
      "charType":0,"readType":0,"tone":"tone2",
      "scores":{"overall":100,"pronunciation":100,"tone":100},
      "span":{"start":28,"end":32},
      "phonemes":[{"phoneme":"E","phone":"e","pronunciation":100,"tone_index":"2"}], ...}],
    "sentences": [{"sentence":"鹅,鹅,鹅,曲项向天歌。","index":0,
      "scores":{"overall":67,"pronunciation":69,"fluency":71,"integrity":68},"details":[...]}],
    "compositeReport": {"compositeScore":67,"emotionScore":66,"stopConnScore":70,
      "intonationScore":72,"readSpeedScore":57, ...},
    "yuguScores": {...}
  },
  "report": {"source":"llm","summary":"本次朗读总体表现尚可…","dimensions":{...},"suggestions":[...]},
  "standardAudio": {"url":"https://…/tts/audio/ab3393f2021111ef.wav","format":"wav","duration":"4.280"},
  "asrText": {"text":"鹅 鹅 鹅 曲 项 向 天 歌 …","alignment":[...]},
  "warnings": []
}

返回值说明

以下逐字段说明以一次真实 /api/v1/evaluate 调用(中文《咏鹅》朗读,recordId eval_9aa80c616edb)为准,示例列为该次调用的真实取值。

顶层字段

字段类型示例(真值)说明
recordIdstring"eval_9aa80c616edb"评测记录 ID,用于 GET /api/v1/report/{recordId} 回查;connected/open 模式前缀为 conn_ / open_
eofint11 = 终评结果
resultobject结构化评分(声通兼容字段命名),见下表
reportobjectAI 综合测评报告:source("llm" 或模板回退)、summarydimensions(integrity/accuracy/fluency/reading_skill/emotion 逐维评语)、suggestions[]
standardAudioobject标准读音:url(完整 URL,可直接播放,GET 无需鉴权)、format("wav")、duration("4.280")
asrTextobject识别文本:text(空格分隔)、alignment[](逐字对齐,见下)
warningsarray[]顶层警告汇总;请与 result.warning[] 合并并过滤空值后使用

result 字段

字段类型示例(真值)说明
overallint67总分(0 ~ scale,默认 0-100)
integrityint68完整度(朗读覆盖率主导:漏读/增读直接拉低)
pronunciationint69发音准确度(字级声学 GOP 调制)
toneint100声调(中文,声学声调分类器判定)
fluencyint71流利度(语速、停顿、自然度)
rhythmint67节奏 / 朗读技巧(停连、重音、语调)
speedint77原始语速(字/分钟),不是分数;归一化语速分见 compositeReport.readSpeedScore(本例 57)
rear_tonestring"rise"句尾语调:rise / fall
durationstring"7.320"音频时长(秒,字符串);numeric_duration 为数值型 7.32
warningarray[]音频质量警告 [{code,message}],码表见 错误码参考
wordsarray18 项字/词级明细,见下表
sentencesarray1 项句级明细:sentence / index / scores{overall,pronunciation,fluency,integrity} / details[](同 words 结构)
compositeReportobject综合报告分:compositeScore 67、integrityScore 68、accuracyScore 69、toneScore 100、nasalsScore 100、phonemeScore 100、fluencyScore 71、readSpeedScore 57、readSpeed 77、skillScore 67、stopConnScore 70、stressScore 67、intonationScore 72、emotionScore 66、timbreScore 68 等,以及 integritySuggest / accuracySuggest / fluencySuggest / skillSuggest / emotionSuggest 五条建议
yuguScoresobject引擎原生层级分:accuracy{tone,nasal,phoneme}、fluency{speed_score,naturalness}、reading_skill{pause,stress,intonation}、prosody、emotion、timbre 等(数值型)
kernel_version / resource_versionstring"1.0.0"评测内核 / 资源包版本(DOC-005 版本同步依据)

words[] 字段(字/词级)

字段示例(真值)说明
word"鹅"字 / 词文本
pinyin / rawpinyin / symbolpinyin"e" / "e2" / "é"拼音(无调 / 数字调 / 符号调)
charType00 = 正常字符;标点等非读字符为其他值
readType00 正常 / 1 错读 / 2 漏读 / 3 重复读;增读字以插入项形式出现
tone"tone2"参考声调
scores{overall:100, pronunciation:100, tone:100, overall_pron:100, prominence:0}字级分;英文另有 stress
span{start:28, end:32}时间区间,单位 10ms 帧(×10 = 毫秒),用于切音回放
pause{type:0, duration:0}该字后停顿类型与时长
phonemes[]{phoneme:"E", phone:"e", category:0, pronunciation:100, tone_index:"2", span:{…}}音素级评分(phonemeOutput 控制),见 音素级评分
normalized_syllables[]{syllables:"鹅", pinyin:"e", tone:"tone2", tone_sandhi:"tone2"}音节归一(含变调 tone_sandhi)
word_parts[]{part:"鹅", charType:0, beginIndex:0, endIndex:0}字符切分定位
phonics[] / linkable[] / false英文自然拼读块 / 是否连读位(英文模式)

asrText.alignment[] 字段(逐字对齐)

字段示例(真值)说明
char / index"鹅" / 0识别字与参考文本下标
read_status"correct"correct / wrong / missed 等朗读判定
start_time / end_time28 / 32时间(10ms 帧)
asr_pinyin / asr_tone / asr_final"e2" / "2" / "e"实际识别拼音 / 声调 / 韵母
gop_score100.0逐字声学 GOP 真值(0-100;无声学证据时为 null,不造假)
connected(英文连读)与 open(开放题)模式的 result 结构不同,见对应章节的真实响应示例。当前版本 report / standardAudio / asrText 四件套默认全部返回。

开放题 POST/api/v1/evaluate · config.coreType = "open"

看图说话 / 情景问答 / 自由表达。与朗读评测同一端点,config 设 coreType:"open" + taskType(picture / situational / free),referenceText 作为题干/任务提示(考生自由作答);返回内容/语言/表达三组主观维 + 客观音质维(Distill-MOS)。

curl -X POST https://open.shengzhiai.com/api/v1/evaluate \
  -H "X-App-Key: test_app_key" \
  -F "audio=@answer.wav;type=audio/wav" \
  -F 'config={"coreType":"open","taskType":"free","referenceText":"请谈谈你最喜欢的季节。","language":"zh-CN"};type=application/json'
cfg = {"coreType": "open", "taskType": "free",
       "referenceText": "请谈谈你最喜欢的季节。", "language": "zh-CN"}
r = requests.post("https://open.shengzhiai.com/api/v1/evaluate",
    headers={"X-App-Key": "test_app_key"},
    files={"audio": ("answer.wav", open("answer.wav", "rb"), "audio/wav"),
           "config": (None, json.dumps(cfg, ensure_ascii=False), "application/json")})
res = r.json()["result"]
print(res["overall"], res["content"], res["feedback"]["suggestions"])
String cfg = "{\"coreType\":\"open\",\"taskType\":\"free\","
    + "\"referenceText\":\"请谈谈你最喜欢的季节。\",\"language\":\"zh-CN\"}";
// multipart 组装与朗读评测完全一致(audio 文件 + config application/json 分片)
// 取分:result.overall / result.content / result.delivery / result.feedback
cw.Write([]byte(`{"coreType":"open","taskType":"free",` +
    `"referenceText":"请谈谈你最喜欢的季节。","language":"zh-CN"}`))
// multipart 组装与朗读评测一致;解析 result.overall / content / delivery / feedback
fd.append("config", new Blob([JSON.stringify({
  coreType: "open", taskType: "free",
  referenceText: "请谈谈你最喜欢的季节。", language: "zh-CN"
})], {type:"application/json"}));
const d = await (await fetch("https://open.shengzhiai.com/api/v1/evaluate",
  {method:"POST", headers:{"X-App-Key":"test_app_key"}, body:fd})).json();
console.log(d.result.overall, d.result.feedback);

响应(节选,真实调用 recordId=open_3538888d398c)

{
  "recordId": "open_3538888d398c", "eof": 1,
  "result": {
    "taskType": "free", "language": "zh", "duration_s": 4.36,
    "transcript": "今天天气很好我们一起去公园小鸟在唱歌", "hasSpeech": true,
    "overall": 57,
    "content":     {"overall":50, "relevance":60, "coherence":50, "task_achievement":40},
    "languageUse": {"overall":35, "grammar":30, "vocabulary":40},
    "delivery":    {"overall":91, "fluency":100, "pronunciation":78,
                    "speech_rate_label":"266.0 字/分", "n_pauses":0},
    "feedback": {"strengths":"…", "weaknesses":"回答偏离了题目要求…", "suggestions":["…"]},
    "audioQuality": {"mos":4.51, "quality":88},
    "scoringSource": "asr+llm+vad+sfgop+distillmos"
  }
}

英文连读 POST/api/v1/evaluate · config.coreType = "connected"

英文连读/失爆/弱读判定与节奏分析。与朗读评测同一端点,config 设 coreType:"connected"language:"en-US",referenceText 为目标英文句;逐词边界标注连读是否实现。

curl -X POST https://open.shengzhiai.com/api/v1/evaluate \
  -H "X-App-Key: test_app_key" \
  -F "audio=@english.wav;type=audio/wav" \
  -F 'config={"coreType":"connected","referenceText":"The quick brown fox jumps over the lazy dog.","language":"en-US"};type=application/json'
cfg = {"coreType": "connected", "language": "en-US",
       "referenceText": "The quick brown fox jumps over the lazy dog."}
r = requests.post("https://open.shengzhiai.com/api/v1/evaluate",
    headers={"X-App-Key": "test_app_key"},
    files={"audio": ("english.wav", open("english.wav", "rb"), "audio/wav"),
           "config": (None, json.dumps(cfg), "application/json")})
res = r.json()["result"]
print(res["connected_overall"], res["linking"], res["rhythm"], res["boundaries"])
String cfg = "{\"coreType\":\"connected\",\"language\":\"en-US\","
    + "\"referenceText\":\"The quick brown fox jumps over the lazy dog.\"}";
// multipart 组装与朗读评测完全一致
// 取分:result.connected_overall / linking / rhythm / boundaries[]
cw.Write([]byte(`{"coreType":"connected","language":"en-US",` +
    `"referenceText":"The quick brown fox jumps over the lazy dog."}`))
// 解析 result.connected_overall / linking / rhythm / boundaries
fd.append("config", new Blob([JSON.stringify({
  coreType: "connected", language: "en-US",
  referenceText: "The quick brown fox jumps over the lazy dog."
})], {type:"application/json"}));
const d = await (await fetch("https://open.shengzhiai.com/api/v1/evaluate",
  {method:"POST", headers:{"X-App-Key":"test_app_key"}, body:fd})).json();
console.log(d.result.connected_overall, d.result.boundaries);

响应(节选,真实调用 recordId=conn_7a9b15dc84f0)

{
  "recordId": "conn_7a9b15dc84f0", "eof": 1,
  "result": {
    "connected_overall": 69, "linking": 8, "rhythm": 100,
    "elision": 0, "reduction": 100,
    "raw": {"linking_rate":0.548, "nPVI_V":31.5, "pctV":33.3, ...},
    "n_boundaries": 3,
    "boundaries": [
      {"between":["quick","brown"], "tags":["elision"],    "gap_ms":90.0,
       "continuity":0.25, "realized":0.37, "start_ms":740.0, "end_ms":970.0},
      {"between":["jumps","over"],  "tags":["linking_CV"], "gap_ms":80.0, ...}
    ],
    "calibrated": true, "classifier": true, "posterior_used": true
  }
}

音素级评分(GOP)

无需单独端点:朗读评测 config 的 phonemeOutput(默认开)即输出音素级明细 —— result.words[].phonemes[] 给出每个声母/韵母的 pronunciation(0-100)、音素符号(phoneme/phone)与时间区间;asrText.alignment[].gop_score 给出逐字声学 GOP 真值(零发音标注的 CTC 后验 + 判别头打分,无声学证据时为 null)。

// 真实 words[0].phonemes(《咏鹅》首字"鹅")
"phonemes": [{"phoneme":"E", "phone":"e", "category":0,
              "pronunciation":100, "tone_index":"2",
              "span":{"start":28,"end":32}}]
// 真实 asrText.alignment[0]
{"char":"鹅","index":0,"read_status":"correct","gop_score":100.0,
 "asr_pinyin":"e2","asr_tone":"2","asr_final":"e"}

实时 WebSocket WS/api/v1/ws/evaluate

边录边传,结束即评。协议(文本帧 JSON + 二进制音频帧):

wss://open.shengzhiai.com/api/v1/ws/evaluate
← {"event":"connected","message":"stream channel ready"}
→ {"cmd":"start","coreType":"sentence","referenceText":"今天天气很好","language":"zh-CN"}
← {"event":"started"}
→ 二进制音频分片 ×N(或文本帧 {"cmd":"audio","data":"<base64>"})
→ {"cmd":"end"}
← {"event":"result","recordId":"eval_…","eof":1,"result":{…},"report":{…},"asrText":{…},"warnings":[]}
分片按音频文件原始字节流顺序发送(推荐 16kHz/16bit 单声道 wav,即首帧含 wav 头;帧大小不限,建议 ≤32KB)。服务端缓冲整段音频,end 后一次性返回与 HTTP 同构的终评;流式中间字幕能力见 /eval 实时模式。
import asyncio, json, websockets

async def main():
    async with websockets.connect("wss://open.shengzhiai.com/api/v1/ws/evaluate") as ws:
        print(await ws.recv())                       # {"event":"connected",...}
        await ws.send(json.dumps({"cmd": "start", "coreType": "sentence",
            "referenceText": "今天天气很好", "language": "zh-CN"}))
        print(await ws.recv())                       # {"event":"started"}
        data = open("reading.wav", "rb").read()
        for i in range(0, len(data), 3200):          # ~100ms/帧
            await ws.send(data[i:i+3200])
        await ws.send(json.dumps({"cmd": "end"}))
        final = json.loads(await ws.recv())          # {"event":"result",...}
        print(final["result"]["overall"])

asyncio.run(main())
// JDK 11+ java.net.http.WebSocket,无第三方依赖
HttpClient client = HttpClient.newHttpClient();
WebSocket ws = client.newWebSocketBuilder()
    .buildAsync(URI.create("wss://open.shengzhiai.com/api/v1/ws/evaluate"),
        new WebSocket.Listener() {
            @Override public CompletionStage<?> onText(WebSocket w, CharSequence data, boolean last) {
                System.out.println(data);            // connected / started / result
                w.request(1);
                return null;
            }
        }).join();
ws.sendText("{\"cmd\":\"start\",\"coreType\":\"sentence\","
    + "\"referenceText\":\"今天天气很好\",\"language\":\"zh-CN\"}", true);
byte[] audio = Files.readAllBytes(Path.of("reading.wav"));
for (int i = 0; i < audio.length; i += 3200)
    ws.sendBinary(ByteBuffer.wrap(audio, i, Math.min(3200, audio.length - i)), true).join();
ws.sendText("{\"cmd\":\"end\"}", true);
// go get github.com/gorilla/websocket
c, _, err := websocket.DefaultDialer.Dial(
    "wss://open.shengzhiai.com/api/v1/ws/evaluate", nil)
if err != nil { panic(err) }
defer c.Close()
c.ReadMessage()                                      // connected
c.WriteJSON(map[string]string{"cmd": "start", "coreType": "sentence",
    "referenceText": "今天天气很好", "language": "zh-CN"})
c.ReadMessage()                                      // started
audio, _ := os.ReadFile("reading.wav")
for i := 0; i < len(audio); i += 3200 {
    end := i + 3200
    if end > len(audio) { end = len(audio) }
    c.WriteMessage(websocket.BinaryMessage, audio[i:end])
}
c.WriteJSON(map[string]string{"cmd": "end"})
_, msg, _ := c.ReadMessage()                         // result
fmt.Println(string(msg))
// npm i ws
import WebSocket from "ws";
import { readFileSync } from "node:fs";

const ws = new WebSocket("wss://open.shengzhiai.com/api/v1/ws/evaluate");
ws.on("message", raw => {
  const d = JSON.parse(raw);
  if (d.event === "connected")
    ws.send(JSON.stringify({cmd:"start", coreType:"sentence",
      referenceText:"今天天气很好", language:"zh-CN"}));
  else if (d.event === "started") {
    const buf = readFileSync("reading.wav");
    for (let i = 0; i < buf.length; i += 3200) ws.send(buf.subarray(i, i + 3200));
    ws.send(JSON.stringify({cmd:"end"}));
  } else if (d.event === "result") {
    console.log(d.result.overall);
    ws.close();
  }
});

声通兼容实时 WebSocket WSwss://host/{coreType}

面向存量声通接入方的流式评测,URL 直接取声通 coreType(如 /sent.eval.cn/word.eval/para.eval.cn 等附录全集)。与同名 POST /{coreType} 兼容 REST 同址共存(升级请求走 WS、普通 POST 走 REST)。

协议:连接后服务端发 {"event":"connected","coreType":"…"} → 客户端先发参数帧 {"refText":"…","language":"zh-CN","realtime_feedback":true}(回 {"event":"started"})→ 推音频(二进制帧,建议 640B/20ms@16k;或 {"cmd":"audio","data":"<base64>"})→ {"cmd":"end"} 触发终评 {"recordId":…,"eof":1,"result":{…}}realtime_feedback=true 时收音过程下发进度中间帧 {"eof":0,"result":{"bytes":n}}
鉴权(可选,不传放行,与原生 WS 一致):凭证走 query —— Token ?token={jwt},或 sig ?appKey=&timestamp=&signature=&nonce=(签名规则见 鉴权)。
// 浏览器:连接声通兼容 WS 做中文句子流式评测
const ws = new WebSocket("wss://ygyx.dragonai.tech/sent.eval.cn");
ws.binaryType = "arraybuffer";
ws.onmessage = (e) => {
  const d = JSON.parse(e.data);
  if (d.event === "connected") ws.send(JSON.stringify({refText:"北京你好", language:"zh-CN", realtime_feedback:true}));
  else if (d.event === "started") sendAudioFramesThen(() => ws.send(JSON.stringify({cmd:"end"})));
  else if (d.eof === 0) console.log("进度", d.result.bytes);
  else if (d.eof === 1) { console.log("终评", d.result.overall); ws.close(); }
};

标准读音 TTS POST/api/v1/tts/generate

JSON body:text(≤1000 字符)、language(zh-CN / en-US / en-GB)、voice(female / male / xiaoyan / xiaofeng)、format(wav / mp3 / ogg)、speed / pitch / volume(0-100,默认 50)、sampleRate(采样率 Hz,可选 8000 / 16000 / 24000,默认 16000)、style(≤200 字自然语言风格指令,如「用新闻播报的语气」「温柔地朗读」)。按合成字符数计量(coreType tts.standard),单 Key 限频 60 次/分。

风格指令 style:是否生效取决于底层模型;若被忽略或回退,响应 data.warnings 会给出提示(正常为空数组 [])。
curl -X POST https://open.shengzhiai.com/api/v1/tts/generate \
  -H "X-App-Key: test_app_key" -H "Content-Type: application/json" \
  -d '{"text":"今天天气真好","language":"zh-CN","voice":"female",
       "format":"wav","sampleRate":16000,"style":"用新闻播报的语气"}'

// 真实响应
{"code":0,"message":"success",
 "data":{"audioUrl":"https://…/tts/audio/ab3393f2021111ef.wav",
         "duration":"4.280","format":"wav","warnings":[]},
 "timestamp":1781274776737}

语音识别 ASR POST/api/v1/asr/recognize

独立语音转文字(纯转写,不评测)。multipart/form-data:audio(音频文件 wav/mp3,≥16kHz)+ language(zh 中文 Paraformer / en 英文 WhisperX,默认 zh)。返回 data.text(转写文本)、data.words(中文逐字时间戳 start_ms/end_ms)、data.duration(秒)、data.confidence(英文置信度)。按音频时长(秒)计量(coreType asr.stream),单 Key 限频 60 次/分。

curl -X POST https://open.shengzhiai.com/api/v1/asr/recognize \
  -H "X-App-Key: test_app_key" \
  -F "audio=@speech.wav" -F "language=zh"

// 真实响应
{"code":0,"message":"success",
 "data":{"text":"今天天气很好我们一起去公园",
         "language":"zh","duration":4.36,
         "words":[{"word":"今","start_ms":290,"end_ms":450}, …],
         "confidence":null},
 "timestamp":1782632…}

报告查询 GET/api/v1/report/{recordId}

按 recordId 查询历史评测报告,返回统一信封:data.recordId / data.score(评测时的完整 result)/ data.report(AI 报告)。recordId 不存在时返回 {"code":40001,"message":"评测记录不存在: …"}

curl https://open.shengzhiai.com/api/v1/report/eval_9aa80c616edb \
  -H "X-App-Key: test_app_key"
→ {"code":0,"message":"success","data":{"recordId":"eval_9aa80c616edb","score":{…},"report":{…}}}

coreType 参考

原生接口 coreType(config 必填)

coreType说明
word单词 / 单字(拼音)评测
sentence句子评测
passage段落 / 篇章评测(含句级 sentences[] 多层级)
alpha英文字母题(referenceText 为空格分隔字母,如 "A B C")
connected英文连读评测(见英文连读)
open开放题 / 自发口语(见开放题,配 taskType)
coreType 为必填且必须是上表之一,不会按文本长度自动选择;language 需与题型语种一致(中文 zh-CN / 英文 en-US、en-GB)。

声通协议兼容层

面向声通存量客户的协议适配端点 POST /{coreType}(coreType 取声通命名:word.eval / sent.eval / para.eval.cn 中文、.pro 自适应变体)。multipart 表单:audio(音频文件)+ request(JSON 字符串,内含 refText 与鉴权三元组 appKey / timestamp(秒级)/ sig);响应按声通字段命名(顶层含 applicationId / dtLastResponse / refText / result),存量接入零改造迁移。

兼容层挂载在引擎服务根路径,当前未在公网平台域(open.shengzhiai.com)开放;存量声通业务迁移请联系平台获取兼容接入地址。公网新接入一律使用原生 /api/v1/evaluate

评测参数

以下参数均放在 config JSON 内(原生接口字段命名,与声通同名参数语义对齐):

参数类型默认范围 / 取值说明
coreTypestring必填word / sentence / passage / alpha / connected / open评测内核,见 coreType 参考
referenceTextstring必填≤1000 字符参考文本;开放题(open)时为题干/任务提示
languagestringen-USzh-CN / en-US / en-GB评测语种,需与题型语种一致
slackfloat0[-1, 1]评分松紧度:>0 更宽松,<0 更严格
scaleint100(0, 100]分数量程上限;scale=10 时 overall ∈ [0,10]
precisionfloat1(0, 1]分数精度步长;0.1 = 保留一位小数
agegroupint31 学前 / 2 小学 / 3 >12 岁年龄段评分基准(影响语速评分区间)
toneWeightfloat0.2[0, 1]中文声调维在 overall 中的权重
refPinyinstringnull空格分隔拼音串多音字注音覆盖,如 "chong2 qing4",优先于 G2P 自动注音
phonemeOutputbooltruetrue / false是否输出 words[].phonemes 音素级评分明细
includeReportbooltrue / falseAI 综合报告开关(当前版本默认返回)
includeStandardAudiobooltrue / false标准读音开关(当前版本默认返回)
includeAsrTextbooltrue / false识别文本开关(当前版本默认返回)
taskTypestringfreepicture / situational / free仅 coreType=open:看图说话 / 情景问答 / 自由表达

错误码参考

① 平台统一信封 code(评测/TTS/报告/控制台接口)

错误响应统一为 {"code", "message", "timestamp"}(成功时为 {"code":0,"message":"success","data":…};评测成功响应直接返回结果体不包信封)。

codeHTTP含义(实测 message 示例)
0200成功
40001400请求参数校验失败,如 "referenceText 不能为空"、"评测记录不存在: eval_xxx"
40100401未认证:"认证失败,请提供有效的认证信息"(缺 X-App-Key / JWT)
40300403无权限,如 "该 API Key 未授权调用此 coreType: sentence"(Key 绑定了 coreType 白名单)
40400404资源不存在
40900409资源冲突(重复创建)
42900429请求频率超限(IP / 用户 / API Key 网关限流)
42901429并发评测数超出套餐层级限制(trial 2 路 / standard 5 路 / enterprise 10 路),建议指数退避重试或升级套餐
42902429试用层 AI 报告生成达每日上限
50000500服务器内部错误:"system busy, please try again later"(含 config 分片缺 application/json 类型的场景)
50010500功能未实现

鉴权细分另有业务码(如 2001 token 已过期、2002 token 无效),以响应 message 为准。

② 声通兼容层错误(实测 detail 原文)

HTTPdetail含义
401[2001] missing appKey/timestamp/sigrequest JSON 缺鉴权三元组
401[2001] sig mismatch签名不匹配(核对 secretKey 与拼接顺序 appKey+timestamp+secretKey)
401[2002] timestamp out of range ±300s时间戳超窗;注意 timestamp 为秒级,传毫秒必触发此错
401[2003] unknown appKeyappKey 不存在
404Unknown coreType: xxxcoreType 不在支持列表(word/sent/para × .eval/.eval.cn/.pro)
422[{"type":"missing","loc":["body","audio"],…}]表单字段缺失/类型错误(缺 audio 文件或 request 字段)

③ 音频质量警告码(result.warning[],评分仍返回)

codemessage说明
1001No valid audio detected!未检测到有效音频(未录上音/与文本完全不一致);分数不可信,引导重录
1002Audio volume too low!音量过低(距麦太远)
1003Audio volume too high!音量过高(截幅)
1004Audio noisy!环境噪声明显
1005Audio not complete!音频不完整(按漏读比例判定疑似截断);分数仅供参考,建议重录
1009scorer degraded部分评分组件临时降级,本次分数仅供参考,建议重试

SDK 使用指南

当前形态:REST + WebSocket 直连,无需安装 SDK。所有能力经标准 HTTP multipart 与 WebSocket 暴露,本页已提供可直接复制运行的多语言调用示例:

语言 / 端示例位置
curl / Python / Java / JavaScript / Go / C / PHP快速开始(HTTP 评测全流程)
Python / Java / Go / Node朗读评测开放题英文连读实时 WebSocket 各接口 Tab
浏览器(录音→评测)Demo 示例 /eval 核心代码

五端 SDK(已发布,对应 PRD §5.3 SDK-001~005)

各 SDK 均封装:整段评测(REST)、TTS、报告查询、原生实时 WS、声通兼容 REST/WS,内置 HMAC-SHA256 签名器(命中统一测试向量)与录音采集(16k/16bit/单声道),含 Demo/示例与 README 集成文档。点击下载(tar.gz):

形态 / 关键技术下载
PC Web(JavaScript)ES Module + UMD;Web Crypto 签名;AudioWorklet 录音yugu-web-sdk.tgz
微信小程序纯 JS(自带 sha256/HMAC,无 subtle 依赖);wx.* API;附示例小程序yugu-miniprogram-sdk.tgz
AndroidKotlin + OkHttp(REST/WS)+ AudioRecord;Gradle 模块 + Demo Activityyugu-android-sdk.tgz
iOSSwift Package;URLSession + CryptoKit + AVAudioEngine;SwiftUI Demoyugu-ios-sdk.tgz
服务端 JavaJDK 17;java.net.http(REST/WS,零三方网络依赖)+ Jackson;含 JUnit 签名测试yugu-java-sdk.tgz

对接契约(字段/签名/协议单一基准):CONTRACT.md · 校验:SHA256SUMS。仍可不装 SDK,直接按本页 REST/WS 示例接入。

接入建议:appKey 是服务端凭据,不要打包进浏览器/客户端代码;Web/小程序/App 端请经你的后端转发评测请求(后端持 X-App-Key 调平台,或用 sig 签名),或使用控制台登录态 JWT。Web/Java SDK 额外支持声通兼容 WS;移动/小程序端默认走原生 WS。

Demo 示例

两个在线工具与平台同域部署,打开即用,均为单文件页面,可直接查看源码作为接入参考。以下代码片段中 BASE = "https://open.shengzhiai.com"

/eval — 在线评测入口(中英完全分流)

中文与英文评测完全分开,先选语言再选能力:中文 /eval/zh/{read,open,realtime}(朗读字/句/篇 · 开放题口语 · 实时),English /eval/en/{read,linking,open,realtime}(reading word/sentence/passage/alphabet · 连读 linking · open · realtime)。每页语言锁定、无语言下拉。支持浏览器录音与本地上传:六维雷达、逐字四色标注、AI 综合报告、标准示范音对比,实时模式边录边出流式字幕。核心调用:

// 浏览器录音 → multipart 评测;能力仅切换 config.coreType(word/sentence/passage/connected/open/alpha),
// config.language 固定 zh-CN 或 en-US(中英分流);实时模式走 WS /api/v1/ws/evaluate。关闭浏览器音频处理三件套,交给引擎链路降噪。
const stream = await navigator.mediaDevices.getUserMedia({audio:
  {noiseSuppression:false, autoGainControl:false, echoCancellation:false}});
const mr = new MediaRecorder(stream), chunks = [];
mr.ondataavailable = e => chunks.push(e.data);
mr.onstop = async () => {
  const fd = new FormData();
  fd.append("audio", new Blob(chunks, {type:"audio/webm"}), "rec.webm");
  fd.append("config", new Blob([JSON.stringify({coreType:"sentence",
    referenceText:"鹅,鹅,鹅,曲项向天歌。", language:"zh-CN"})],
    {type:"application/json"}));
  const r = await fetch(BASE + "/api/v1/evaluate",
    {method:"POST", headers:{"X-App-Key":"test_app_key"}, body:fd});
  render((await r.json()).result);   // 六维分 + words[] 逐字四色着色
};
mr.start(); setTimeout(() => mr.stop(), 5000);

/annotate.html — 多评委标注工具(共识金标)

评委加载样本清单(JSON:sample_id / audio / refText),听音后对完整度、准确度、流利度、朗读技巧、情感、声调六维做 0-100 打分,纯客户端导出 CSV(sample_id,rater_id,dimension,score),不调用评测接口、与机器分相互独立。多评委各自导出后用 annotation_aggregate.py 聚合:两两 Pearson、ICC(2,k) 一致性、共识金标 CSV 与低一致样本主动学习标记,用于引擎校准与验收对比。核心流程:

// 纯前端:六维滑杆打分(0-100)→ Blob 导出 CSV,无任何网络请求
let csv = "sample_id,rater_id,dimension,score\n";
for (const sid in store)
  for (const dim in store[sid])   // integrity/accuracy/fluency/reading_skill/emotion/tone
    csv += `${sid},${rater},${dim},${store[sid][dim]}\n`;
download(new Blob([csv], {type:"text/csv"}), `ratings_${rater}.csv`);

// 多评委线下聚合(脚本随引擎工程交付):
//   python annotation_aggregate.py ratings_A.csv ratings_B.csv ratings_C.csv
//   → 两两 Pearson / ICC(2,k) / 共识金标 CSV / 低一致样本标记

FAQ 常见问题

1. 支持哪些音频格式与采样率?

支持 wav / mp3 / ogg 上传(≤10MB、≤5 分钟)。推荐 16kHz、16bit、单声道 wav(引擎内部统一以 16kHz 处理);其他采样率会自动重采样,但 8kHz 等过低采样率会损失高频信息、影响声母/音素判定精度。WebSocket 按音频文件原始字节流分片发送(推荐 wav,首帧含文件头)。

2. coreType 怎么选?中英文有什么差异?

原生接口 coreType 为必填枚举:word / sentence / passage / alpha / connected / open,不会按文本长度自动选择;语种由 language(zh-CN / en-US / en-GB)指定,需与题型一致。中文评测多产出声调(tone)维、儿化/平翘舌/前后鼻音诊断;英文多产出重音(stress)维,并可用 connected 模式做连读/失爆/弱读分析。声通命名的 coreType(sent.eval.cn 等)仅用于声通兼容层(见参考)。

3. 警告码 1001-1005、1009 分别是什么意思?要重测吗?

它们是音频质量警告(不是错误,评分仍正常返回):1001 未检测到有效音频、1002 音量过低、1003 音量过高(截幅)、1004 环境噪声明显、1005 音频不完整(按漏读比例判定疑似截断)、1009 部分评分组件临时降级。出现 1001 / 1005 时分数不可信,建议引导用户重录;1002-1004 可提示用户调整距离/环境后重试;1009 建议重试一次。警告位于 result.warning[] 数组,每项含 code 与 message,如 {"code":1001,"message":"No valid audio detected!"};顶层 warnings[] 为冗余汇总且可能含空占位,请两处合并并过滤空值后使用。

4. 鉴权方式怎么选?sig 怎么算?

X-App-Key:请求头 X-App-Key: {appKey}(或查询参数 ?api_key=),推荐服务端新接入使用,最简单。② Bearer JWT:仅用于控制台登录态(/api/v1/auth/login 颁发);appKey 放 Bearer 头会报 code 2002 token无效。③ 声通 sig 兼容(兼容层 + 声通兼容 WS):请求头 X-App-Key / X-Timestamp(秒级)/ X-Nonce(可选)/ X-Signature;签名 = 业务参数丢空值后按 key 升序拼 k1=v1&k2=v2Base64( HMAC-SHA256( payload, secretKey ) );timestamp 与服务器相差超 ±300s 判重放拒绝。WS 握手时凭证改走 query(?appKey=&timestamp=&signature=&nonce=)。已接声通的客户用方式③可零改造迁移。

5. 计费按什么维度?

评测接口每次成功调用计量三项:调用次数 + 参考文本字符数 + 音频时长(秒),按 appKey 逐日累计,具体计价以所购套餐为准(实时用量在控制台「用量统计」页查看);TTS 按合成字符数计。WebSocket 流式评测与 HTTP 同口径(一次会话计一次调用)。失败调用(参数校验失败、评测异常)不计费

6. 英文连读评测怎么调?

POST /api/v1/evaluate,multipart 与朗读评测完全一致,config 设 coreType:"connected" + language:"en-US" + referenceText。响应返回 connected_overall(连读总分)、linking(连读)、rhythm(节奏)、elision(失爆)、reduction(弱读),并在 boundaries[] 对每个词边界标注连读类型(linking_CV / elision 等)与实现度 realized。完整真实响应见 英文连读

7. 开放题(无参考朗读文本)怎么调?

POST /api/v1/evaluate,config 设 coreType:"open" + taskType(picture 看图说话 / situational 情景问答 / free 自由表达),referenceText 填题干(仅作相关性参考,考生自由作答)。返回 content(相关/连贯/任务达成)、languageUse(语法/词汇)、delivery(流利/发音/语速)三组维度 + feedback(优点/不足/建议)+ 客观音质 audioQuality(Distill-MOS)。完整真实响应见 开放题

8. 评测报告能查询多久?

GET /api/v1/report/{recordId} 返回 data.score(与评测时完全相同的 result)与 data.report。报告落库持久保存,当前版本不主动清理,建议业务侧仍在 90 天内回查并自行归档。recordId 不存在或已清理时返回 {"code":40001,"message":"评测记录不存在: …"}

9. 标准示范音的 URL 怎么用?

评测响应里 standardAudio.url(及 TTS 接口的 data.audioUrl)是完整 URL,GET 不需要鉴权头,可直接喂给 <audio> 标签播放或下载。其路径形如 /tts/audio/{id}.wav,同路径在平台域同样可达(https://open.shengzhiai.com/tts/audio/{id}.wav),浏览器端建议改写为平台域相对路径以避免跨域。

10. 返回 400(code 40001)/ 500(code 50000)常见原因?

50000 最常见原因是 config 分片没有声明 Content-Type:config 必须作为 multipart 的一个分片传入且该分片 Content-Type 为 application/json(curl 写 -F 'config={…};type=application/json',JS 用 new Blob([json],{type:"application/json"})),否则被当作 octet-stream 解析失败。40001 常见原因:缺 coreType / referenceText(message 直接给出缺哪个);coreType 不在枚举(word/sentence/passage/connected/open/alpha);language 取值非 zh-CN/en-US/en-GB;referenceText 超 1000 字符。另:API Key 绑定了 coreType 白名单时调未授权类型返回 403 code 40300

11. 并发限制是多少?超限怎么办?

评测并发按客户层级限额:试用(trial)2 路、标准(standard)5 路、企业(enterprise)10 路;超限立即返回 429 {"code":42901},不排队、不计费,客户端应做指数退避重试(如 1s/2s/4s)。试用层 AI 报告生成另有每日上限(超出返回 42902)。网关层对 IP/用户/Key 还有 QPS 限流(42900)。需要更高并发请升级套餐或联系商务。

12. WebSocket 实时评测的消息怎么发?

连接 wss://open.shengzhiai.com/api/v1/ws/evaluate 后服务端先推 {"event":"connected"};客户端发 {"cmd":"start","coreType":"sentence","referenceText":"…","language":"zh-CN"}(收到 {"event":"started"})→ 持续发二进制音频分片(或 {"cmd":"audio","data":"<base64>"} 文本帧)→ 发 {"cmd":"end"},服务端返回 {"event":"result",…} 终评(与 HTTP 响应同构)。当前开放平台 WS 为"边录边传、结束即评";录音过程中的流式字幕/跟读反馈见 /eval 实时模式。四语言完整示例见 实时 WebSocket

实时接口清单(OpenAPI 自动同步)

DOC-005:本清单在页面加载时实时从平台域 /openapi.json(引擎 OpenAPI 反代)拉取,与线上引擎 API 版本自动同步,无需手工维护。注:清单为引擎全量端点;公网平台域开放的入口以上文各章节为准。
方法路径说明