Voice API: Forensic matching

Introduction
About data-sets
How does it work in implementations?


Introduction

Forensic matching is about finding a person by his voiceprint. It is a biometric method of identification, but differing from speaker recognition in this case we are not focused on 1:1 person matching. Instead, we are trying to find a person in a database (data-set) of voiceprints.

Term "forensic" comes because it is quite offten used in forensic science (initially was used by law-enforcement agencies), where it is used to find a person by his voiceprint, from a set of voiceprints of "known offenders" in order to identify who is the person speaking in a call, recording or a video. In commercial use, forensic matching is quite often used in insurance/banking call-centers, where fraudsters might be trying to impersonate someone else in order to get access to their account, or to get a loan, etc.

About data-sets

Data-sets being sought for a match are typically quite large, and can contain millions of voiceprints. This is why forensic matching is a very complex and resource-intensive process, and why it is not offered by many companies. MachineSense is one of the few companies that can offer this service. Differing from other companies, we are offering forensic matching also as a service, and not just as a product. Our main USP is that we can search through millions of voiceprints in a matter of seconds, whereby competition is typically on the level of searching through thousands of voiceprints in a matter of minutes.

Data-sets are typically of two types: closed and open. Closed data-sets are those where the data-set is known and fixed, and where the data-set is not expected to change. Open data-sets are those where the data-set is expected to change, and where new voiceprints are expected to be added to the data-set.

Data-sets are usually either a "blacklist" (list of known offenders) or a "whitelist" (list of known good people, to be approved).
"Blacklisted" persons are typically those who are not allowed to access a certain service, or who are not allowed to perform a certain action. In case of insurance companies, for example, this would be a known fraudster with the voiceprint included in such data-set.
"Whitelisted" persons are typically those who are allowed to access a certain service, or who are allowed to perform a certain action. One sees this implementation quite often in case of physicall access to a building, where a person is allowed to enter a building only if his voiceprint matches the one in the data-set.

"Whitelist" data-sets are usually the ones being dynamically expanded/contracted, while "Blacklist" data-sets are most commonly pre-arranged, since there is no possibility for cooperation on the side of the "blacklisted" person. Either way, both are possible to be dynamically expanded/contracted, and MachineSense is offering both options.

Dynamic expansion/contraction of such data-sets is a kind of "enrolment" (in terms of speaker recognition). For forensic matching purposes, we provide our customers with special tools for such (implicit or explicit) enrolments.

How does it work in implementations

Here are the steps needed in order to perform a forensic matching:

  1. You prepare or obtain a data-set of voiceprints (voice-vectors) of persons you want to search for. This is typically done by using our speaker recognition API, where you can enrol persons one-by-one, or in bulk, and where you can also enrol persons implicitly (by using the voiceprints of persons you already have in your data-set).
    In some cases, you might already have a data-set of voiceprints, and you might want to use that data-set for forensic matching. Maybe you purchased or otherwise obtained a "blacklist" or "whitelist" data-set.
    Additionally, MachineSense can provide you with (offline and online) tools to create, manage and prepare those data-sets for real-time forensic matching.
  2. During a conversation (call, video, etc.) you obtain a voiceprint of a person you want to search for. This is typically done by your call-center or other service receiving calls (or other voice data) from your end-users. This (current) voiceprint is then sent to our API for forensic matching.
  3. If there are close matches, you receive in the response a list of those matches, sorted by confidence levels. This information can be used to flag that call/conversation and possibly alert the operator about the possible fraudster or other type of match.
    In case of authentications (for example - physical building access), you can use this information to allow or deny access to the person in front of the microphone or mobile device with GPS and microphone. In such cases, MachineSense is always recommending to combine several methods of authentication and not only rely on the biometric authentication. Although it is very precise, in security terms - a phaenomenon such as "trust" should always be built on several methods, and is completely different than (partial/indicated) "distrust".