Speech recognition is a procedure of converting speech into text. It is not a biometric method of identification, but rather a method of "translating" spoken into textual input/output. You might want to use it for transcriptions, Internet bots (chatbots), voice assistants, etc.
MachineSense API can handle both recordings of speech (audio files) and live speech (streaming). Primary use is with recordings, since in most of the implementations, the speech is recorded first and then sent to the server for processing. Those are mostly very small files/BLOBs (as used for example in voice-chatbots or IVRs), and results are always definite, that's why this is primary use.
(Ad-hoc) recording-based speech recognition is described on this page. It is assuming standard REST-API calls, where you send the recording to the server and receive the response with the content of that speech (what was said, in textual form, while specifying the language it was spoken in).
Streaming speech recognition is described separately. For quick understanding: When performing streaming speech recognition, MachineSense servers are expecting SIP/RTP or WebRTC streams on our endpoints, and are sending back the content of the speech to predefined web-hooks (in your account settings). There are two types of results sent when operating in streaming mode: Immediate (approximate) results will be sent in near-realtime; Final results will be sent when the end-user stops speaking or has a pause of more than 1 second. Then - results will be "sharpened" and sent as definite.
Read more about features of our speech recognition.
(Call from your server to our API.)
POST /voice/v1/speech
Parameters / body:
{ "audio": "string", "api_key": "string", "ref": "string", "dictionary": "string", "language": "string", "precision": "string" }
Parameters explained:
Code: 200
Default response:
{ "result": "Ok", "code": 0, "message": "string", "data": { "ref": "string", "speech": "string" } }
Response explained: