API: Voice (speaker) enrolment
Introduction
Call
Response
Introduction
Voice-enrolment is the first step in voice-biometric authentication process. In this step, a mathematical representation of
main characteristics of a person's voice is taken, by analysing a voice-sample (utterance) of that person.
This representation is called "voice-print" and in this process, such voice-print is resulting in a voice-vector (a series/array of
numerical values), used later in comparison with the actual voice-sample of a person trying to authorize.
As mentioned, such vectors (face-prints and voice-prints) are non-PII (non Personally Identifiable Information). This means that by itself,
they cannot be used to reconstruct the face-image or voice-sample from which they were created.
Initially, MachineSense customer/partner sends a captured voice-sample of their end-user from their
website or mobile app to their own servers (this step is completely independent of MachineSense), and than call the MachineSense
API, including that image and some parameters.
In order to help customers start quickly with such client-side (web-based) implementation, MachineSense offers a set of
examples and code, ready for copy/paste into your applications and be customized/modified. Basic operations with capturing the
voice, setting up parameters, formatting the BLOB, etc. will be already present in those examples.
Examples are written in vanilla JavaScript, and can be used in any web-based application.
You can find them on our Demo page as well as our
GitHub repository.
Customer creates own client-side page or app, including capturing of the user's voice-sample. Exception to this might be if
customer is using MachineSense white-label client-side, in which case this is already done for them.
The latter process, however is a two-step process and is related to pre-built / ready-to-use modules.
When capturing the voice-sample, you should choose the manner in which you will authenticate your end-user. Namely,
MachineSense presents you with several authentication options (and this is not common with other voice-auth providers).
Depending on the option for verification chosen, you should also choose the way you collect end-user's voice-sample and
present the user with instructions to perform that correctly.
Those options are (among others):
- Text-dependent enrolment/verification on a fixed pre-set sentence (such as "My voice is my passport").
- Text-dependent enrolment/verification on a user-defined sentence (user's date-of-birth or random chosen passphrase, for example).
- Text-independent enrolment/verification where user may speak anything, in a language of their choice (as long as duration is
satisfactory, typically 5 seconds). In this case, voice-sample of a user might be taken even in the background.
- Liveness-checked enrolment/verification, where voice-characteristics are sent as well as the content of the speech,
and form a base for verification of randomly presented strings (which couldn't be pre-recorded, hence anti-spoofing).
For more details on this, please see this doc about MachineSense speaker
recognition.
More details about single-step and two-step processes.
Call
(Call from your server to our API.)
POST /voice/v1/enroll_voice
Parameters / body:
{
"audio": "string",
"api_key": "string",
"ref": "string",
"method": "string",
"phrase": "string",
"content": {
"include": false,
"language": "string",
"precision": "string"
}
}
Parameters explained:
- "audio" = (mandatory) Voice-sample BLOB, encoded as AAC in b64.
- "api_key" = (mandatory) Your developer key found in your Settings
- "ref" = (optional, default="") Any string you wish to send back to yourself, that you will receive
with the later response to your webhook.
- "method" = (mandatory) MachineSense method required. Can be one of:
- "text_dep_fixed" - Text-dependent with fixed phrase (such as "My voice is my passport"). Phrase specified
in "phrase" parameter.
- "text_dep_dob" - Text-dependent with end-user's date-of-birth. If content.include = true, will receive in
the response the actual date-of-birth.
- "text_dep_user" - Text-dependent with user-defined pass-phrase (but "text dependent" because that
pass-phrase should be repeated when authenticating later.
- "text_indep" - Text-independent, meaning - any phrase of at least 5 seconds can be used for enrolment (and
any phrase of at least 5 seconds can be used for verification later).
- "liveness" - Enrolment with possible liveness detection on verification. This is done by presenting to the
end user a list of words (for example - numbers 0-9) to speak for enrolment. On verification, few of those words, in
shuffled order are presented to verify within n seconds. This is assuming that potential spoofer did not have enough
time to record/edit/deepfake the spoofed voice sample (playback).
List of words is supplied under "phrase" parameter, as a space-separated string (Example: "one two three" or "cat dog house").
- "phrase" - Specified for "text_dep_fixed" or "liveness" methods. Otherwise - empty.
- "content" - Object relating to speech-recognition (content of the spoken text)
- "include" - boolean, if to include the content of the speech recognition (content of the spoken utterance)
- "language" - if "include": true, specifies the language for which speech recognition should
be performed (example: NL-NL or EN-US)
- "precision" - depth and processing level of the speech recognition algorithm. Possible values are:
"full", "quick", "ultraquick".
Response
Code: 200
Default response:
{
"result": "Ok",
"code": 0,
"message": "string",
"data": {
"ref": "string",
"vector": [
0
],
"speech": "string"
}
}
Response explained:
- "result" = "Ok" or "Err" (error)
- "code" = 0 or error code (int)
- "message" = If result "Err" - textual description (string)
- "data" = JSON object with data
- "ref" = Referential free-form string sent in either single-step- or two-step process (on session init).
- "vector" = Array of numbers, end-user's voice-print. X-Vector.
- "speech" = Content of the spoken enrolment utterance. If content asked for. In language specified in the call.
Top of the Page