Latest entry in Apple’s machine learning diary focuses on ‘Hey Siri’ trigger phrase


Apples last entry in its Machine Learning Journal online, focuses on the personalization process that users participate in when activating “Hey Siri” features on iOS devices. In all Apple products, “Hey Siri” invokes the company’s AI assistant and can be followed by questions such as “How’s the weather?” or “Message daddy I’m on my way.”

“Hey Siri” was introduced in iOS 8 on the iPhone 6, and at that time it could only be used while the iPhone was charging. Subsequently, the trigger phrase could be used at any time thanks to a low-power, always-on processor that fueled the iPhone and iPad’s ability to constantly listen to “Hey Siri”.

In the new Machine Learning Journal entry, Apple’s Siri team details their technical approach to developing a “speaker recognition system”. The team created deep neural networks and “set the stage for improvements” in future iterations of Siri, all driven by the goal of creating “on-device personalization” for users.

The Apple team said that “Hey Siri” as a phrase was chosen because of its “natural” wording, and described three scenarios in which unintentional activations are troubling for the “Hey Siri” feature. . These include “when primary users say a similar phrase”, “when other users say” Hey Siri “, and” when other users say a similar phrase “. According to the team, the last scenario is “the most annoying fake activation of all.”

To reduce these accidental Siri activations, Apple relies on techniques from the field of speaker recognition. It’s important to note that the Siri team says they are focusing on ‘who’s talking’ and less on ‘what’s been said’.

The overall goal of speaker recognition (SR) is to determine the identity of a person using their voice. We are interested in “who is speaking”, as opposed to the problem of speech recognition, which aims to determine “what was said”. SR performed using a phrase known a priori, such as “Hey Siri”, is often referred to as text dependent SR; otherwise, the problem is called text independent SR.

The journal entry then explains how users enroll in a personalized “Hey Siri” process using explicit and implicit enrollment. The explicit starts the minute users say the trigger phrase multiple times, but the implicit is “created over a period of time” and created in “real world situations”.

The Siri team says the remaining challenges with speaker recognition are figuring out how to get quality performance in reverberant (large room) and noisy (car) environments. You can check out the full Machine Learning Journal entry on “Hey Siri” here.

Since its debut last summer, Apple has shared numerous entries in its Machine Learning Journal on complex topics, which have already included “Hey Siri”, face detection, and more. All past entries can be viewed at

Leave A Reply

Your email address will not be published.