Voice processing types
Speaker recognition
Speech recognition
Forensic matching
Privacy, PII and GDPR
In order to understand the difference between the two types of voice processing, we need to understand the difference between speaker recognition and speech recognition.
Speaker recognition is referring to recognizing/identifying WHO is currently speaking. It identifies a person in front of
a voice-recording device (microphone). It is a biometric method identification, similar to face recognition, fingerprint recognition, etc.
In fact, most of the procedures performed in function of speaker recognition will be similar or will strongly resemble those of face recognition.
As we will see later - speaker recognition is assuming that the person is known to the system, and that the system has a voiceprint of that person
(after enrolment procedure). This voiceprint will be, again analogus to face recognition, a vector that will be passed to our customer/partner
after such enrolment.
As such - speaker recognition is not spoken-language-dependent. It will compare the characteristics of a particular spoken utterance with the
voiceprint that the user claiming to be that person has. It will work in any spoken language.
Speaker recognition is used for identity verification, authentication, and identification.
More details about speaker recognition
Speech recognition, on the other hand, is a process of recognizing WHAT is being said. It is a process of converting speech to text.
This process IS spoken-language-dependent, and MachineSense supports a certain amount of languages for speech recognition. Currently, this
amounts to 16 languages and we are improving and expanding our language libraries all the time.
Speech recognition is used for transcriptions, Internet bots (automated interactive response systems), and other purposes.
One of the unique features of MachineSense voice platform is that it can apply both speaker recognition and speech recognition in the same
process. This is done by using the speaker recognition to identify the speaker, and then using the speech recognition to convert the spoken
utterance to text.
This possibility allows MachineSense to introduce, among other, a specific Liveness Detection method, which is based on checking in
real-time if the person speaking could respond to a time-limited challenge (such as "Please say the following 4 digits: 1234", or "Please
say the following words: Cat, House, Tree"). In this way, by presenting a random challenge to the user, we can prevent spoofs in form of
voice recordings.
More details about speech recognition
Third application type is the so-called Forensic matching. In this sub-platform, MachineSense is performing a matching of a current
voice sample / utterance with a large dataset of voice samples, in order to find a match. This is a very specific application, and is called
"forensic" because it is used with such purpose to identify if current speaker is a member of a "whitelist" or "blacklist" dataset.
After performing such search, MachineSense will return a list of matches, sorted by certain confidence levels where top-match(es) are assumed
to be possible "perpetrators".
This application is used in security, law enforcement, and other areas where it is important to identify a person based on his voice, after
comparing to the list of implicitly or explicitly enroled persons, and returning their ranking in such data-set.
Typical customers/partners for this application are law enforcement agencies, security companies, insurance companies (checking in databases
of known fraudsters or other offenders), etc.
More details about forensic matching
MachineSense is a company that is very much aware of the importance of privacy and protection of personal data. We are also aware of the
importance of compliance with the GDPR and other privacy regulations.
As such, we have implemented a number of measures to ensure that our customers and partners are able to use our services in a way that is
compliant with the GDPR and other privacy regulations.
We have also implemented a number of measures to ensure that our own internal processes are compliant with the GDPR and other privacy
regulations.
Similarly to face recognition, in our Voice API we are not storing any voice samples, but we only process the voice-samples / utterances, and work only on the voiceprints. Voiceprints are vectors (arrays of numbers) which on itself do not contain any personal data. They are non-PII (non-personally-identifiable information).
By default, those voice-vectors are NOT stored on our side (as well as any other data), but are instead returned to you, the customer, to store enrolment and verification data on your side, possibly together with the other user-identification data.
MachineSense will only provide you the biometric- and other voice-processing methods, which are operating as simply remote real-time functions, and we will not inject ourselves between you and your users. This means that all the privacy- and GDPR- related measures you might be implementing will not be disturbed anyhow by our services.