MachineSense - Voice API Introduction

Voice API Introduction

Voice processing types
     Speaker recognition
     Speech recognition
     Forensic matching
Privacy, PII and GDPR

Voice processing types

In order to understand the difference between the two types of voice processing, we need to understand the difference between speaker recognition and speech recognition.

Speaker recognition

Speaker recognition is referring to recognizing/identifying WHO is currently speaking. It identifies a person in front of a voice-recording device (microphone). It is a biometric method identification, similar to face recognition, fingerprint recognition, etc. In fact, most of the procedures performed in function of speaker recognition will be similar or will strongly resemble those of face recognition.
As we will see later - speaker recognition is assuming that the person is known to the system, and that the system has a voiceprint of that person (after enrolment procedure). This voiceprint will be, again analogus to face recognition, a vector that will be passed to our customer/partner after such enrolment.
As such - speaker recognition is not spoken-language-dependent. It will compare the characteristics of a particular spoken utterance with the voiceprint that the user claiming to be that person has. It will work in any spoken language.

Speaker recognition is used for identity verification, authentication, and identification.

More details about speaker recognition

Speech recognition

Speech recognition, on the other hand, is a process of recognizing WHAT is being said. It is a process of converting speech to text.
This process IS spoken-language-dependent, and MachineSense supports a certain amount of languages for speech recognition. Currently, this amounts to 16 languages and we are improving and expanding our language libraries all the time.

Speech recognition is used for transcriptions, Internet bots (automated interactive response systems), and other purposes.

One of the unique features of MachineSense voice platform is that it can apply both speaker recognition and speech recognition in the same process. This is done by using the speaker recognition to identify the speaker, and then using the speech recognition to convert the spoken utterance to text.
This possibility allows MachineSense to introduce, among other, a specific Liveness Detection method, which is based on checking in real-time if the person speaking could respond to a time-limited challenge (such as "Please say the following 4 digits: 1234", or "Please say the following words: Cat, House, Tree"). In this way, by presenting a random challenge to the user, we can prevent spoofs in form of voice recordings.

More details about speech recognition

Forensic matching

Third application type is the so-called Forensic matching. In this sub-platform, MachineSense is performing a matching of a current voice sample / utterance with a large dataset of voice samples, in order to find a match. This is a very specific application, and is called "forensic" because it is used with such purpose to identify if current speaker is a member of a "whitelist" or "blacklist" dataset.
After performing such search, MachineSense will return a list of matches, sorted by certain confidence levels where top-match(es) are assumed to be possible "perpetrators".

This application is used in security, law enforcement, and other areas where it is important to identify a person based on his voice, after comparing to the list of implicitly or explicitly enroled persons, and returning their ranking in such data-set.
Typical customers/partners for this application are law enforcement agencies, security companies, insurance companies (checking in databases of known fraudsters or other offenders), etc.

More details about forensic matching

Privacy, PII and GDPR

MachineSense is a company that is very much aware of the importance of privacy and protection of personal data. We are also aware of the importance of compliance with the GDPR and other privacy regulations.
As such, we have implemented a number of measures to ensure that our customers and partners are able to use our services in a way that is compliant with the GDPR and other privacy regulations.
We have also implemented a number of measures to ensure that our own internal processes are compliant with the GDPR and other privacy regulations.

Similarly to face recognition, in our Voice API we are not storing any voice samples, but we only process the voice-samples / utterances, and work only on the voiceprints. Voiceprints are vectors (arrays of numbers) which on itself do not contain any personal data. They are non-PII (non-personally-identifiable information).

By default, those voice-vectors are NOT stored on our side (as well as any other data), but are instead returned to you, the customer, to store enrolment and verification data on your side, possibly together with the other user-identification data.

MachineSense will only provide you the biometric- and other voice-processing methods, which are operating as simply remote real-time functions, and we will not inject ourselves between you and your users. This means that all the privacy- and GDPR- related measures you might be implementing will not be disturbed anyhow by our services.