SpeechRecognition

The SpeechRecognition interface provides a high-level API for performing speech recognition. It supports real-time speech recognition and recognition of audio files, offering flexibility in a variety of use cases.

Features Overview

Real-Time Recognition: Transcribe live audio from the microphone.
File-Based Recognition: Analyze and transcribe recorded audio files.
Multi-Language Support: Specify the recognition locale for different languages.
Intermediate Results: Access partial and final results for progressive transcription.
Custom Callbacks: Handle transcription results and sound level changes with event listeners.

Type Definitions

`RecognitionTaskHint`

Hints for the type of task for which speech recognition is used:

'confirmation': For commands like "yes," "no," or "maybe."
'dictation': For tasks similar to keyboard dictation.
'search': For identifying search terms.
'unspecified': For general-purpose speech recognition.

`SpeechRecognitionResult`

Represents the result of speech recognition:

isFinal: Indicates if the transcription is complete and final.
text: Convenience alias for bestTranscription.formattedString, kept for backward compatibility.
bestTranscription: The transcription with the highest confidence level (SpeechTranscription).
transcriptions: Alternative transcriptions of the audio, sorted in descending order of confidence (SpeechTranscription[]).
metadata: Aggregate speech metrics. Only available on final results (iOS 14.5+) (SpeechRecognitionMetadata).

`SpeechTranscription`

A transcription of recognized speech:

formattedString: The entire transcription formatted into a single, user-displayable string.
segments: The individual word/utterance segments that compose this transcription (SpeechTranscriptionSegment[]).

Speaking rate and average pause duration are reported on SpeechRecognitionResult.metadata instead, and only on final results.

`SpeechTranscriptionSegment`

A single segment within a transcription, typically corresponding to a word:

substring: The text content of this segment.
substringRange: The UTF-16 character range of substring within the parent transcription's formattedString ({ location: number, length: number }).
timestamp: The seconds offset, relative to the audio start, at which this segment was spoken.
duration: The duration in seconds of this segment within the audio.
confidence: Confidence in the accuracy of this segment, in [0.0, 1.0]. Only meaningful on final results; partial results typically report 0.
alternativeSubstrings: Alternative substrings the recognizer also considered for this segment.

`SpeechRecognitionMetadata`

Aggregate speech metrics for a final recognition result:

speakingRate: Speaking rate in words per minute.
averagePauseDuration: Average pause duration in seconds between words.
speechStartTimestamp: Seconds offset within the audio at which the user started speaking.
speechDuration: Duration in seconds of the spoken speech.

Static Properties

Supported Locales

supportedLocales: Returns a list of locales supported by the speech recognizer, such as "en-US", "fr-FR", or "zh-CN".

Recognition State

isRecognizing: Indicates whether a recognition request is currently active.

Methods

Start Real-Time Recognition

start(options: object): Promise<boolean>
Starts speech recognition from the device microphone.

Options

locale: Locale string for the desired language (optional).
partialResults: Return intermediate results (default: true).
addsPunctuation: Automatically add punctuation to results (default: false).
requestOnDeviceRecognition: Keep audio data on the device (default: false).
taskHint: Specify the recognition task type ('confirmation', 'dictation', 'search', 'unspecified').
useDefaultAudioSessionSettings: Use default audio session settings (default: true).
preferredInput: Preferred audio input port. 'auto' (default) lets the system choose; 'builtInMic' forces the device's built-in microphone even when a Bluetooth headset is connected — useful for keeping wireless headphones for playback while recording from the built-in mic for better audio quality. Falls back silently to the system default when builtInMic is not available on the device.
onResult: Callback for recognition results (SpeechRecognitionResult).
onSoundLevelChanged: Callback for sound level changes (optional).

Example

await SpeechRecognition.start({
  locale: "en-US",
  partialResults: true,
  addsPunctuation: true,
  onResult: (result) => {
    console.log("Transcription:", result.text)
  },
  onSoundLevelChanged: (level) => {
    console.log("Sound Level:", level)
  }
})

Recognize Speech in Audio Files

recognizeFile(options: object): Promise<boolean>
Starts recognition for a recorded audio file.

Options

filePath: Path to the audio file.
locale: Locale string for the desired language (optional).
partialResults: Return intermediate results (default: false).
addsPunctuation: Automatically add punctuation to results (default: false).
requestOnDeviceRecognition: Keep audio data on the device (default: false).
taskHint: Specify the recognition task type ('confirmation', 'dictation', 'search', 'unspecified').
onResult: Callback for recognition results (SpeechRecognitionResult).

Example

await SpeechRecognition.recognizeFile({
  filePath: FileManager.join(FileManager.documentDirectory, "example.wav"),
  locale: "en-US",
  addsPunctuation: true,
  onResult: (result) => {
    console.log("File Transcription:", result.text)
  }
})

Stop Recognition

stop(): Promise<void>
Stops an active speech recognition session.

Example

await SpeechRecognition.stop()

Examples

Real-Time Recognition with Progress Updates

await SpeechRecognition.start({
  locale: "en-US",
  onResult: (result) => {
    console.log(result.isFinal ? "Final Result:" : "Partial Result:", result.text)
  },
  onSoundLevelChanged: (level) => {
    console.log("Sound Level:", level)
  }
})

Recognize Audio File

await SpeechRecognition.recognizeFile({
  filePath: FileManager.join(FileManager.documentDirectory, "audio.m4a"),
  partialResults: false,
  onResult: (result) => {
    console.log("File recognition completed. Transcription:", result.text)
  }
})

Inspect Word-Level Timing and Alternatives

await SpeechRecognition.recognizeFile({
  filePath: FileManager.join(FileManager.documentDirectory, "audio.m4a"),
  partialResults: false,
  onResult: (result) => {
    if (!result.isFinal) return

    for (const segment of result.bestTranscription.segments) {
      console.log(
        `[${segment.timestamp.toFixed(2)}s + ${segment.duration.toFixed(2)}s] `
          + `${segment.substring} (confidence ${segment.confidence.toFixed(2)})`
      )
      if (segment.alternativeSubstrings.length > 0) {
        console.log("  alternatives:", segment.alternativeSubstrings.join(", "))
      }
    }

    if (result.metadata) {
      console.log("Speaking rate (wpm):", result.metadata.speakingRate)
      console.log("Speech duration (s):", result.metadata.speechDuration)
    }

    if (result.transcriptions.length > 1) {
      console.log("Alternative transcriptions:")
      for (const t of result.transcriptions.slice(1)) {
        console.log(" -", t.formattedString)
      }
    }
  }
})

Stop Active Recognition

if (await SpeechRecognition.start({
  // ...
})) {
  // Stop after 10 seconds.
  setTimeout(() => {
    await SpeechRecognition.stop()
  }, 10 * 1000)
}

Notes

Ensure the necessary microphone or file access permissions are granted before using this API.
Use supportedLocales to determine available languages for recognition.
For optimal performance, use audio files in formats supported by iOS (e.g., .wav, .m4a).

SpeechRecognition#

#Features Overview

#Type Definitions

#RecognitionTaskHint

#SpeechRecognitionResult

#SpeechTranscription

#SpeechTranscriptionSegment

#SpeechRecognitionMetadata

#Static Properties

#Supported Locales

#Recognition State

#Methods

#Start Real-Time Recognition

#Options

#Example

#Recognize Speech in Audio Files

#Options

#Example

#Stop Recognition

#Example

#Examples

#Real-Time Recognition with Progress Updates

#Recognize Audio File

#Inspect Word-Level Timing and Alternatives

#Stop Active Recognition

#Notes

SpeechRecognition

Features Overview

Type Definitions

`RecognitionTaskHint`

`SpeechRecognitionResult`

`SpeechTranscription`

`SpeechTranscriptionSegment`

`SpeechRecognitionMetadata`

Static Properties

Supported Locales

Recognition State

Methods

Start Real-Time Recognition

Options

Example

Recognize Speech in Audio Files

Options

Example

Stop Recognition

Example

Examples

Real-Time Recognition with Progress Updates

Recognize Audio File

Inspect Word-Level Timing and Alternatives

Stop Active Recognition

Notes