SpeechRecognition
The SpeechRecognition interface provides a high-level API for performing speech recognition. It supports real-time speech recognition and recognition of audio files, offering flexibility in a variety of use cases.
Features Overview
- Real-Time Recognition: Transcribe live audio from the microphone.
- File-Based Recognition: Analyze and transcribe recorded audio files.
- Multi-Language Support: Specify the recognition locale for different languages.
- Intermediate Results: Access partial and final results for progressive transcription.
- Custom Callbacks: Handle transcription results and sound level changes with event listeners.
Type Definitions
RecognitionTaskHint
Hints for the type of task for which speech recognition is used:
'confirmation': For commands like "yes," "no," or "maybe."'dictation': For tasks similar to keyboard dictation.'search': For identifying search terms.'unspecified': For general-purpose speech recognition.
SpeechRecognitionResult
Represents the result of speech recognition:
isFinal: Indicates if the transcription is complete and final.text: Convenience alias forbestTranscription.formattedString, kept for backward compatibility.bestTranscription: The transcription with the highest confidence level (SpeechTranscription).transcriptions: Alternative transcriptions of the audio, sorted in descending order of confidence (SpeechTranscription[]).metadata: Aggregate speech metrics. Only available on final results (iOS 14.5+) (SpeechRecognitionMetadata).
SpeechTranscription
A transcription of recognized speech:
formattedString: The entire transcription formatted into a single, user-displayable string.segments: The individual word/utterance segments that compose this transcription (SpeechTranscriptionSegment[]).
Speaking rate and average pause duration are reported on
SpeechRecognitionResult.metadatainstead, and only on final results.
SpeechTranscriptionSegment
A single segment within a transcription, typically corresponding to a word:
substring: The text content of this segment.substringRange: The UTF-16 character range ofsubstringwithin the parent transcription'sformattedString({ location: number, length: number }).timestamp: The seconds offset, relative to the audio start, at which this segment was spoken.duration: The duration in seconds of this segment within the audio.confidence: Confidence in the accuracy of this segment, in[0.0, 1.0]. Only meaningful on final results; partial results typically report0.alternativeSubstrings: Alternative substrings the recognizer also considered for this segment.
SpeechRecognitionMetadata
Aggregate speech metrics for a final recognition result:
speakingRate: Speaking rate in words per minute.averagePauseDuration: Average pause duration in seconds between words.speechStartTimestamp: Seconds offset within the audio at which the user started speaking.speechDuration: Duration in seconds of the spoken speech.
Static Properties
Supported Locales
supportedLocales: Returns a list of locales supported by the speech recognizer, such as"en-US","fr-FR", or"zh-CN".
Recognition State
isRecognizing: Indicates whether a recognition request is currently active.
Methods
Start Real-Time Recognition
start(options: object): Promise<boolean>
Starts speech recognition from the device microphone.
Options
locale: Locale string for the desired language (optional).partialResults: Return intermediate results (default:true).addsPunctuation: Automatically add punctuation to results (default:false).requestOnDeviceRecognition: Keep audio data on the device (default:false).taskHint: Specify the recognition task type ('confirmation','dictation','search','unspecified').useDefaultAudioSessionSettings: Use default audio session settings (default:true).preferredInput: Preferred audio input port.'auto'(default) lets the system choose;'builtInMic'forces the device's built-in microphone even when a Bluetooth headset is connected — useful for keeping wireless headphones for playback while recording from the built-in mic for better audio quality. Falls back silently to the system default whenbuiltInMicis not available on the device.onResult: Callback for recognition results (SpeechRecognitionResult).onSoundLevelChanged: Callback for sound level changes (optional).
Example
Recognize Speech in Audio Files
recognizeFile(options: object): Promise<boolean>
Starts recognition for a recorded audio file.
Options
filePath: Path to the audio file.locale: Locale string for the desired language (optional).partialResults: Return intermediate results (default:false).addsPunctuation: Automatically add punctuation to results (default:false).requestOnDeviceRecognition: Keep audio data on the device (default:false).taskHint: Specify the recognition task type ('confirmation','dictation','search','unspecified').onResult: Callback for recognition results (SpeechRecognitionResult).
Example
Stop Recognition
stop(): Promise<void>
Stops an active speech recognition session.
Example
Examples
Real-Time Recognition with Progress Updates
Recognize Audio File
Inspect Word-Level Timing and Alternatives
Stop Active Recognition
Notes
- Ensure the necessary microphone or file access permissions are granted before using this API.
- Use
supportedLocalesto determine available languages for recognition. - For optimal performance, use audio files in formats supported by iOS (e.g.,
.wav,.m4a).
