How to make iPhone apps talk using text to speech

by Nov 18, 2015

Transcript – How to make iPhone apps talk using text to speech

Sometimes you just want to hear another voice. We’re going to make our iOS app talk using the AVSpeechSynthesizer and text to speech.

The AVSpeechSynthesizer is a class that takes text, and produces speech. If we’re looking to add speech for maybe a dozen or so things in our app, we’re probably better off recording normal human voices. If we have lots of dynamic things we want the phone to say from text, and we want it to sound like a robot, AVSpeechSynthesizer is where we want to be.

The voice of the synthesizer is locale specific. That means we can give the voice an Australian accent. The choices for languages to use are determined by the language your phone is set to in your settings. For example, mine is English, so I can choose accents from the US, Ireland, Australia, and South Africa. If the text you want to say and voice aren’t in the same language, speech synthesis will fail.

A voice must be downloaded to our phone. None of the code will work, until we add a voice. We can check to see what voices are available in settings. It’s in General | Accessibility | Speech, and then Voices.

We give the synthesizer an object called an AVSpeechUtterance. This is the bit of text we want the synthesizer say. This can be a single word, a phrase, a sentence or more. The utterance is also configured with a voice and some parameters to customize how the voice should work for this utterance.

The configuration parameters control the pitch of the voice, how fast the voice speaks, how loud, and any pauses we want between utterances. The reason each utterance gets its own voice and configuration parameters is, we might want to slow down one sentence for emphasis, or maybe we want to bounce between voices to handle a conversation, and so on.

In our app, lets make our app talk when we push a button. The UI is just a button that says speak. Nothing fancy.

So the first thing we want to do is add a synthesizer to our app. The AVSpeechSynthesizer is in AVFoundation. We’ll import AVFoundation, and then create one synthesizer for the entire app.

import UIKit
import AVFoundation

class ViewController: UIViewController {
    let speechSynthesizer = AVSpeechSynthesizer()

Next we’ll want to handle the button press. The first thing to do is create the text we want our app to say. We’ll just create an AVSpeechUtterance and pass in the text.

The speech utterance needs a voice. Normally we’d just create a voice, and set it to that voice. This doesn’t appear to work in iOS 9. So this is a bit of a hack, but we can iterate over the list of available voices on the phone and select one that way. We’ll do this once in viewDidLoad. We’ll search for the Australian accent.

Here’s the fun part. We can configure the voice to sound how we want. We can change the rate to control how fast the voice is speaking.

The pitch multiplier controls how high of a voice we want.

The volume controls how loud or soft the voice is. It’s default is 1.

The AVSpeechSynthesizer is really a queue. We can pass the synthesizer many things to say, and it will say them in order. The pre and post utterance delay controls how much delay we want between utterances. We can think of this as controlling how much of a breath the phone takes between sentences.

Finally we tell the synthesizer to speak, and we’re done. That’s all there is to making it speak.

// This is the action called when the user presses the button.
@IBAction func speak(sender: AnyObject) {
    let speechUtterance = AVSpeechUtterance(string: "How can you tell which one of your friends has the new iPhone 6s plus?")

    // set the voice
    speechUtterance.voice = self.speechVoice
    // rate is 0.0 to 1.0 (default defined by AVSpeechUtteranceDefaultSpeechRate)
    speechUtterance.rate = 0.1
    // multiplier is between >0.0 and 2.0 (default 1.0)
    speechUtterance.pitchMultiplier = 1.25
    // Volume from 0.0 to 1.0 (default 1.0)
    speechUtterance.volume = 0.75
    // Delays before and after saying the phrase
    speechUtterance.preUtteranceDelay = 0.0
    speechUtterance.postUtteranceDelay = 0.0
    // Give the answer, but with a different voice
    let speechUtterance2 = AVSpeechUtterance(string: "Don't worry, they'll tell you.")
    speechUtterance2.voice = self.speechVoice

We’ll likely also want to control our UI as the phone speaks. For example, we might want to highlight words as our phone speaks them. We can make our view controller a AVSpeechSynthesizerDelegate, and handle different events.

The main methods available in the delegate are didStartSpeechUtterance, didFinishSpeechUtterance and willSpeakRangeOfSpeechString. Did start and did finish are pretty obvious. These are called before the utterance is started, and finished.

willSpeakRangeOfSpeechString is called before each word in your utterance is spoken. It’s a bit tricky, because the method gives us the entire utterance each time, and an NSRange so we can calculate which word is about to be spoken. It would be nice if they gave us a Swift range, but that’s what we get.

// Called before speaking an utterance
func speechSynthesizer(synthesizer: AVSpeechSynthesizer, didStartSpeechUtterance utterance: AVSpeechUtterance) {
    print("About to say '\(utterance.speechString)'");

// Called when the synthesizer is finished speaking the utterance
func speechSynthesizer(synthesizer: AVSpeechSynthesizer, didFinishSpeechUtterance utterance: AVSpeechUtterance) {
    print("Finished saying '\(utterance.speechString)");

// This method is called before speaking each word in the utterance.
func speechSynthesizer(synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) {
    let startIndex = utterance.speechString.startIndex.advancedBy(characterRange.location)
    let endIndex = startIndex.advancedBy(characterRange.length)
    print("Will speak the word '\(utterance.speechString.substringWithRange(startIndex..<endIndex))'");

So that’s how we convert text to speech on iOS devices.

If you have any questions let me know in the comments. Or if you just want to say hi. If you liked this video, like and share. New videos come out each week, so make sure you subscribe. You don’t want to miss a video! Thanks for watching, and I’ll see you in the next tutorial!

<p><script async="" src="//"></script>
<!-- DeegeU - Right Side -->
<ins class="adsbygoogle" style="display:inline-block;width:336px;height:280px" data-ad-client="ca-pub-5305511207032009" data-ad-slot="5596823779"></ins>
(adsbygoogle = window.adsbygoogle || []).push({});

Tools Used

  • Xcode Version 7.1.1 (7B1005)
  • Swift 2.0

Media Credits

All media created and owned by DJ Spiess unless listed below.

  • No infringement intended

Cold Funk – Funkorama by Kevin MacLeod is licensed under a Creative Commons Attribution license (

Get the code

The source code for “How to make iPhone apps talk using text to speech” can be found on Github. If you have Git installed on your system, you can clone the repository by issuing the following command:

 git clone

Go to the Support > Getting the Code page for more help.

If you find any errors in the code, feel free to let me know or issue a pull request in Git.

Don’t miss another video!

New videos come out every week. Make sure you subscribe!



DJ Spiess

DJ Spiess

Your personal instructor

My name is DJ Spiess and I’m a developer with a Masters degree in Computer Science working in Colorado, USA. I primarily work with Java server applications. I started programming as a kid in the 1980s, and I’ve programmed professionally since 1996. My main focus are REST APIs, large-scale data, and mobile development. The last six years I’ve worked on large National Science Foundation projects. You can read more about my development experience on my LinkedIn account.

Pin It on Pinterest

Share This