Bertin IT introduces MediaSpeech v6, its latest multilingual speech recognition solution – paving the way for the era of augmented communications

MediaSpeech® offers the industry’s best capabilities for the operation and in-depth analysis of media and telecommunications data – paving the way for the era of augmented communications.

Paris, 4th June 2019 – Bertin IT (CNIM Group) announces the release of the new version of MediaSpeech®, its multilingual speech recognition solution that converts audio tracks to searchable text transcripts, enabling audio and video sources, as well as telecommunications, to be indexed searched and analysed. MediaSpeech® now also comes in a live version for real-time audio streams, paving the way for new interactive and augmented communications applications.

Thanks to deep neural networks (or deep learning) commonly used in Artificial Intelligence systems, MediaSpeech® creates an extremely fine model of the acoustic space which is robust with different speakers (flow, accents…) and acoustic conditions, so offering even faster and more accurate transcription.


  • Speech recognition with each word being transcribed words with their timecodes to the millisecond and assigned a recognition confidence score.
  • Automatic detection of spoken language (LID).
  • Automatic segmentation speaking slots and speakers with gender recognition.
  • Identification of the speaker from a biometric database.
  • Automatic and semi-automatic adaptation of vocabularies and domains.

And all this in 17 different languages.

MediaSpeech® has several variations:  deployed on site or in SaaS mode, hosted on Bertin IT’s cloud, MediaSpeech® Factory can handle large volumes of files with guaranteed performance levels; a new version MediaSpeech® Live is able to transcribe audio streams on the fly, opening the door to innovative real-time applications – voice chatbots, call-bots, enhanced call centres (the enhanced call centre concept involves the provision of assistance to the adviser during the call by performing searches, opening applications or entering data, so streamlining and improving the quality of the dialogue.)…

“Originally developed by Vecsys, which was acquired by Bertin in 2011, MediaSpeech® was initially developed for Defence and Security applications,” says Yves Rochereau, General Manager of Bertin IT. “Bertin IT’s R&D team worked to enhance the system and expand its areas of application for several years. MediaSpeech® now addresses demanding customers such as media groups, audiovisual monitoring companies, contact centres and large trading rooms in banks, which use it to index, search and analyse audio and video content for purposes such as monitoring, alerts, reporting and compliance with banking regulations. We are now highly optimistic about the opportunities offered by the new versions of MediaSpeech®, especially as major deployment contracts have already been signed.”

Among the main improvements in the new version of MediaSpeech®:

  • MediaSpeech® Live version for processing audio streams in real time.
  • New neural models make transcription two to three times faster and more accurate.
  • “Full” neuronal transition of all speech processing modules: speech detection (VAD) and speaker segmentation (Diarization) for even greater accuracy.
  • Easy installation process, stronger security and new interfaces.
  • A fully neuronal language identification module (LID) with increased accuracy, even for relatively short sections of speech.

“This is the first commercial language identification module which is entirely neuronal. During evaluation this module proved to be not only better than previous systems, the results were also superior to the results of laboratories at the forefront of the field” says Samir Bennacef, Director of speech R&D at Bertin IT. “MediaSpeech® now combines technological excellence and functional flexibility which is unique in the marketplace,” adds Ariane Nabeth Halber, Speech Solutions Director at Bertin IT. “The ability to run both in SaaS and local modes, to manage files in “batch” mode and in real time, for all languages in the catalogue, is an undeniable asset. Added to this is the high accuracy of the engine, which has recently won a number of customer benchmark comparisons against solutions from major US companies and local developers.”

Version 6 of MediaSpeech® is already being used by several customers of Bertin IT, including a major French investment and finance bank. The MediaSpeech Live version has just been delivered to another major banking group for use at its contact centres. Other contracts for the new version of Mediaspeech® are expected to be signed very soon.

Download the press release