【Android进阶】Android平台的文字转语音使用记录

【Android进阶】Android平台的文字转语音使用记录

本文介绍了在Android平台上使用TextSpeech实现文字转语音的使用记录。

背景

最近在做AI大模型对接的一些功能,调用完chat接口返回结果之后,发现豆包和Kimi等客户端都有语音播报功能,并且这些大厂经过一系列调优,可以实现很好听的音色和节奏停顿的效果。

那个人开发者可不可以在系统自带的免费语音助手的基础上做一个tts(Text To Speech)的播报呢?

调查发现Google已经有相关的接口了,并且尝试使用魅族20Pro手机成功实现了语音播报效果,记录一下使用过程。

TextSpeech

TextSpeech 是Android平台的文字转语音的接口,可以将文本合成为语音,可以支持立即播放,或者存储为音频文件。

初始化实例

创建实例需要传入两个参数,一个Context,一个连接的监听器,监听器会在初始化完成后回调。

    /**
     * The constructor for the TextToSpeech class, using the default TTS engine.
     * This will also initialize the associated TextToSpeech engine if it isn't already running.
     *
     * @param context
     *            The context this instance is running in.
     * @param listener
     *            The {@link TextToSpeech.OnInitListener} that will be called when the
     *            TextToSpeech engine has initialized. In a case of a failure the listener
     *            may be called immediately, before TextToSpeech instance is fully constructed.
     */
    public TextToSpeech(Context context, OnInitListener listener) {
        this(context, listener, null);
    }

播放停止与释放

就播放功能来说,使用起来非常简单,只需要创建一个TextSpeech对象,然后调用speak方法即可。

方法签名:

    public int speak(final CharSequence text,
                     final int queueMode,
                     final Bundle params,
                     final String utteranceId) {
        return runAction((ITextToSpeechService service) -> {
            Uri utteranceUri = mUtterances.get(text);
            if (utteranceUri != null) {
                return service.playAudio(getCallerIdentity(), utteranceUri, queueMode,
                        getParams(params), utteranceId);
            } else {
                return service.speak(getCallerIdentity(), text, queueMode, getParams(params),
                        utteranceId);
            }
        }, ERROR, "speak");
    }

参数说明: text:要转换的文本 queueMode:播放模式,有三种:QUEUE_ADDQUEUE_FLUSHQUEUE_MODE_DEFAULT params:参数,包括语音的语言、音调、语速等 utteranceId:唯一标识,用于区分不同的语音

停止时调用该对象的 stop() 方法,使用完毕退出时,需要调用 shutdown() 方法来释放引擎所使用的原生资源。我猜会这里会占用系统的多媒体编解码器连接,使用完需要及时释放防止其他app播放多媒体资源出错。

工具类完整代码

使用object实现单例,全局共享,在viewmodel里初始化,给界面提供接口。

object SpeechUtils {

    private lateinit var textToSpeech: TextToSpeech

    private const val TAG = "SpeechUtils"

    private const val TEST_IDENTIFIER = "test"

    private const val TEST_HELLO = "Hi, How are you? I'm fine. Thank you. And you?"

    private var isConnected = false

    val ttsConnectedListener = TextToSpeech.OnInitListener { status ->
        Log.d(TAG, "OnInitListener status: $status")
        isConnected = status == TextToSpeech.SUCCESS
    }

    fun init() {
        textToSpeech = TextToSpeech(appContext, ttsConnectedListener)
    }

    fun speak(text: String = TEST_HELLO, locale: Locale = Locale.US) {
        Log.d(TAG, "==========>speak<=========")
        if (isConnected) {
            textToSpeech.language = locale
            textToSpeech.speak(
                text,
                TextToSpeech.QUEUE_ADD,
                null,
                TEST_IDENTIFIER
            )
        } else {
            Log.d(TAG, "==========>TTS is not connected!<=========")
        }
    }

    fun stop() {
        textToSpeech.stop()
    }

    fun shutdown() {
        textToSpeech.shutdown()
    }
}

Viewmodel代码:

class MainStateHolder(
    private val retroService: RetroService,
    private val ktorClient: KtorClient,
) : ViewModel() {

    companion object {
        const val TAG = "MainStateHolder"
        const val TOKEN_KEY = "token"
        const val USER_NAME_KEY = "user_name"
    }

    init {
        SpeechUtils.init()
    }

    // ...

 
    fun speak(text: String, locale: Locale) {
        SpeechUtils.speak(text, locale)
    }
    
    fun stopSpeech(){
        SpeechUtils.stop()
    }

    override fun onCleared() {
        super.onCleared()
        ktorClient.release()
        SpeechUtils.shutdown()
    }
}

界面使用时,服务器返回值时调用播放,页面取消组合时,调用stop停止。同时,加入LifeCycle感知,在Activity退到后台,也调用停止接口:

@Composable
fun MyServerPage(
    mainStateHolder: MainStateHolder,
    lifecycleOwner: LifecycleOwner = LocalLifecycleOwner.current,
    onBackStack: () -> Unit,
) {
    BasePage("个人服务器测试", onCickBack = onBackStack) {

        LaunchedEffect(Unit) {
            mainStateHolder.getMyServerResponse()
        }

        val myResponse = mainStateHolder.myServerResponseStateFlow.collectAsState().value

        LaunchedEffect(myResponse) {
            if (myResponse.isNotEmpty()) {
                mainStateHolder.speak(myResponse, Locale.US)
            }
        }

        Box(
            modifier = Modifier.fillMaxSize(),
            contentAlignment = androidx.compose.ui.Alignment.Center
        ) {
            Text(text = myResponse)
        }

        DisposableEffect(lifecycleOwner) {
            Log.i("MyServerPage", "MyServerPage ${lifecycleOwner.lifecycle.currentState}")
            val observer = LifecycleEventObserver { _, event ->
                if (event == Lifecycle.Event.ON_STOP) {
                    // 当 Activity 退到后台时,Lifecycle 会触发 ON_STOP 事件
                    mainStateHolder.stopSpeech()
                }
            }
            lifecycleOwner.lifecycle.addObserver(observer)

            onDispose {
                mainStateHolder.stopSpeech()
                lifecycleOwner.lifecycle.removeObserver(observer)
            }
        }
    }
}

后续尝试使用付费版本的本地引擎,集成aar到本地进行调用,达到更好的播放效果。使用方式应该都是按照原生的接口设计。