Speech analytics, also known as audio mining, is software that uses a variety of techniques to convert unstructured conversations into structured output, turning it into metadata and transcripts. The output files can then be analyzed, and the enterprise can use the findings to identify customer insights, needs and wants. See the figure below.

What is Speech Analytics?

DMG Speech Analytics

Copyright DMG Consulting LLC, July 2014

How Speech Analytics Works

Speech analytics applications use a variety of mathematical algorithms, analytic techniques, contextual and other call/customer-related metadata (including desktop and text analytics) to structure unstructured conversations. These techniques are applied to recorded audio files and live (real-time) conversations. The speech analytics process is multi-phased; it starts with a speech engine – either phonetic or large vocabulary continuous speech recognition (LVCSR, also known as speech-to-text) – to convert conversations into system-readable data for further analysis. Next, each of the speech analytics solutions applies its own technology and methodology to create output (metadata), which is indexed prior to making it available to end users for search and analysis.

The audio mining solution sends recorded or real-time conversations through a speech engine that breaks them into phonemes (representations of sounds), or words, phrases and concepts. To improve accuracy and provide context, each of the solutions applies a variety of techniques to further analyze and index the conversations, using proprietary algorithms to enhance the recognition capabilities. The underlying speech engine determines the output file; it may be a transcript of the conversation, an index of words, phrases and categories, or an index of phonemes. All output files must be searchable so that end users can create queries to parse the data into words, topics or categories for root cause, trend and correlation analysis. Once the data is bucketed into categories, users can run reports or conduct ad hoc searches of databases or indices populated by the analysis/indexing layer of the application.

Speech analytics output is delivered via dashboards, reports and alerts. Users can also run queries and see their findings online or in reports. Many solutions come with pre-defined queries related to business issues or concerns, such as customer satisfaction, experience and retention; first call resolution; operational effectiveness and weaknesses; agent compliance and adherence to scripts; agent compliance with regulatory requirements; sales and marketing effectiveness; competitive intelligence; product feedback; etc.

Some speech analytics solutions also perform emotion detection, either by evaluating linguistic events, acoustics (volume, pitch and rate of speech) or a combination of the two approaches. When emotion detection is used together with root cause analysis, it is very helpful in identifying positive or negative trends.

Real-Time Speech Analytics

Real-time speech analytics, which emerged in 2011, will play a very important role in the future of this market. These solutions analyze calls as they are happening, and deliver some form of actionable recommendations to managers, supervisors or agents. Real-time speech analytics is being used to identify situations where agents are not in compliance with their script, guidelines or standard operating procedures. They can also help identify when customers are very unhappy or angry, so that a supervisor can intervene to rectify the situation in real time.

For more information about contact center speech analytics market, please download the Genesys-sponsored reprint from the report here.