Introduction
About data-sets
How does it work in implementations?
Forensic matching is about finding a person by his voiceprint. It is a biometric method of identification, but differing from speaker recognition in this case we are not focused on 1:1 person matching. Instead, we are trying to find a person in a database (data-set) of voiceprints.
Term "forensic" comes because it is quite offten used in forensic science (initially was used by law-enforcement agencies), where it is used to find a person by his voiceprint, from a set of voiceprints of "known offenders" in order to identify who is the person speaking in a call, recording or a video. In commercial use, forensic matching is quite often used in insurance/banking call-centers, where fraudsters might be trying to impersonate someone else in order to get access to their account, or to get a loan, etc.
Data-sets being sought for a match are typically quite large, and can contain millions of voiceprints. This is why forensic matching is a very complex and resource-intensive process, and why it is not offered by many companies. MachineSense is one of the few companies that can offer this service. Differing from other companies, we are offering forensic matching also as a service, and not just as a product. Our main USP is that we can search through millions of voiceprints in a matter of seconds, whereby competition is typically on the level of searching through thousands of voiceprints in a matter of minutes.
Data-sets are typically of two types: closed and open. Closed data-sets are those where the data-set is known and fixed, and where the data-set is not expected to change. Open data-sets are those where the data-set is expected to change, and where new voiceprints are expected to be added to the data-set.
Data-sets are usually either a "blacklist" (list of known offenders) or a "whitelist" (list of known good people, to be approved).
"Blacklisted" persons are typically those who are not allowed to access a certain service, or who are not allowed to perform a certain action.
In case of insurance companies, for example, this would be a known fraudster with the voiceprint included in such data-set.
"Whitelisted" persons are typically those who are allowed to access a certain service, or who are allowed to perform a certain action.
One sees this implementation quite often in case of physicall access to a building, where a person is allowed to enter a building only if
his voiceprint matches the one in the data-set.
"Whitelist" data-sets are usually the ones being dynamically expanded/contracted, while "Blacklist" data-sets are most commonly pre-arranged, since there is no possibility for cooperation on the side of the "blacklisted" person. Either way, both are possible to be dynamically expanded/contracted, and MachineSense is offering both options.
Dynamic expansion/contraction of such data-sets is a kind of "enrolment" (in terms of speaker recognition). For forensic matching purposes, we provide our customers with special tools for such (implicit or explicit) enrolments.
Here are the steps needed in order to perform a forensic matching: