Cobalt offers a variety of speech & language technologies and specializes in customizing speech technology for your specific needs, with models tailored to your domain. All of our engines can be deployed on prem, in a private cloud, or on an embedded system so your data never leaves your control. Each engine has a protobuf API to make it easy to integrate with any gRPC-supported language.
Cubic is Cobalt’s state-of-the-art speech recognition system. Cubic uses deep-learning models for fast, accurate speech recognition. Cubic’s technology builds upon Kaldi, the leading ASR research toolkit, and thus stays current with the latest trends in the broader speech recognition community.
Juzu, Cobalt’s diarization engine, differentiates between speakers in a conversation based on distinct characteristics of their voices. Diarization greatly improves the utility of automatic speech recognition when multiple speakers are recorded on a single channel. For example, Juzu identifies speaker transitions in a court transcript, or allows different analysis of a patient’s speech versus a therapist’s in a recorded session.
Diatheke, Cobalt’s dialogue manager, allows defining sophisticated conversational flows to create a powerful interactive natural-language system. Diatheke provides a seamless interface between the Cubic speech recognizer, Cobalt’s natural language understanding (NLU) engine Chosun, and Cobalt’s text-to-speech (TTS) engine Luna. The result is a highly configurable spoken-language understanding user interface that can easily integrate with your back-end service solutions.
Chosun, Cobalt’s Natural Language Understanding (NLU), takes in a string of text and extracts two pieces of information: intent and entities (see Glossary). Chosun uses a flexible model pipeline to allow chaining together exact-match rules, keyword-based rules, and/or deep-learning statistical models for optimal accuracy. Chosun can be imported directly as a library but is usually called via the Diatheke API.
Luna, Cobalt’s Text to Speech (TTS) engine, uses deep learning-trained parametric synthesis models to produce speech that sounds natural, highly intelligible and customizable.
Cobalt’s Telefol engine can spot words or phrases in real-time, or from a collection of recorded audio. It operates phonetically, so it recognizes search terms that are not in a dictionary. It can be configured to run with extremely low CPU utilization to allow simultaneous real-time analysis of dozens audio streams per core.
Our Profound tool compares a user’s pronunciation of a word or phrase to the reference pronunciation. This can be used to provide feedback or fluency scoring for language learners, or to identify pronunciation variants (e.g. for proper names) to add to a lexicon.
Our Dorado voiceprint generator and analyzer can help you know who is speaking, given a collection of voiceprints of known speakers. It can be combined with a password or other credentials for a powerful multi-factor authentication tool.
Our Mano speaker classification tool can support modeling of various characteristics of the speaker, allowing you to detect the age, gender, regional accent, or other information of an unknown speaker.