Introducing a new text-to-speech engine on Wear OS

Introducing a new text-to-speech engine on Wear OS


Posted by Ouiam Koubaa – Product Manager and Yingzhe Li – Software Engineer

Today we're proud to announce the release of a new Text-To-Speech (TTS) engine that's powerful and reliable. Text-to-speech converts text into natural-sounding speech in more than 50 languages, powered by Google's machine learning (ML) technology. The new text-to-speech engine on Wear OS uses smaller and more efficient prosody ML models to enable faster synthesis on Wear OS devices.

Use cases for Wear OS's text-to-speech can range from accessibility services, coaching prompts for training apps, navigation prompts, and reading aloud incoming alerts through the watch speaker or Bluetooth-connected headphones. The engine is intended for short interactions and should not be used to read a long article or a long summary of a podcast.

How to use Wear OS's TTS

Text-to-speech has long been supported on Android. Wear OS's new TTS is tuned to be powerful and reliable on low-memory devices. All Android APIs are still the same, so developers use the same process to integrate them into a Wear OS app, for example: TextToSpeech#speak can be used to speak specific text. This is available on devices running Wear OS 4 or later.

When the user interacts with Wear OS TTS for the first time after the device boots up, the synthesis engine is ready in about 10 seconds. For special cases where developers want the watch to speak immediately After opening an app or starting an experience, the following code can be used to preheat the TTS engine before synthesis requests arrive.

private fun initTtsEngine() {
    // Callback when TextToSpeech connection is set up
    val callback = TextToSpeech.OnInitListener { status ->
        if (status == TextToSpeech.SUCCESS) {
            Log.i(TAG, "tts Client Initialized successfully")


            // Get default TTS locale
            val defaultVoice = tts.voice
            if (defaultVoice == null) {
                Log.w(TAG, "defaultVoice == null")
                return@OnInitListener
            }


            // Set TTS engine to use default locale
            tts.language = defaultVoice.locale




            try {
                // Create a temporary file to synthesize sample text
                val tempFile =
                        File.createTempFile("tmpsynthesize", null, applicationContext.cacheDir)


                // Synthesize sample text to our file
                tts.synthesizeToFile(
                        /* text= */ "1 2 3", // Some sample text
                        /* params= */ null, // No params necessary for a sample request
                        /* file= */ tempFile,
                        /* utteranceId= */ "sampletext"
                )


                // And clean up the file
                tempFile.deleteOnExit()
            } catch (e: Exception) {
                Log.e(TAG, "Unhandled exception: ", e)
            }
        }
    }


    tts = TextToSpeech(applicationContext, callback)
}

When you are done using TTS, you can release the engine by calling tts.shutdown() in your activities onDestroy() method. This command should also be used when closing an app that uses TTS.

Languages ​​and locales

By default, Wear OS TTS includes 7 preloaded languages ​​in the system image: English, Spanish, French, Italian, German, Japanese, and Mandarin Chinese. OEMs can choose to preload a different set of languages. You can check which languages ​​are available by using TextToSpeech#getAvailableLanguages(). If the user selects a system language that is not a preloaded voice file during watch setup, the watch will automatically download the corresponding voice file the first time the user connects to Wi-Fi while charging their watch.

There are limited cases where the speech output may differ from the user's system language. For example, in a scenario where a safety app uses TTS to call emergency services, developers might want to synthesize speech in the language of the locale the user is in, rather than the language the user has set their watch to. To synthesize text in a language other than the system settings, use TextToSpeech#setLanguage(java.util.Locale);

Conclusion

Your Wear OS apps can now talk directly via the watch's speakers or via Bluetooth connected headphones. Learn more about using TTS.

We look forward to seeing how you use the text-to-speech engine to create more useful and engaging experiences for your users on Wear OS!

Copyright 2023 Google LLC.
SPDX-License-Identifier: Apache-2.0