Select Language

SK AI Summit 2025 Review "Voice AI is important in design."

Dec 12, 2025

Insights

At this SK AI Summit, there was a significant increase in interest regarding how to apply technology to our services rather than the technology itself. Audion’s structure-focused explanation was effective, and there were many questions about signal-based analysis (emotion, emphasis periods, speaking patterns) after STT. Attendees showed a strong need for insight layers, quantification of emotional changes, real-time applicability, handling of technical terms, and light PoC onboarding. Overall, it was confirmed that voice AI is transitioning from ‘recognition’ to ‘understanding’, and this change demands a designable structure.

The most prominent feeling at the largest AI event held by SK, the SK AI Summit, was that interest has shifted from curiosity about technology to how it can be applied to actual services or products.

This time, I first introduced the structure of Audion, a voice AI middleware platform, and explained that the emotion recognition and voice highlight features are extensions derived from the API.

When explained in this way, the technology was understood much faster, and the conversation immediately shifted to “What scenarios would be possible if we applied this to our service?” which I found most impressive.

Most Frequently Asked Questions on Site

Question Area	Actual Question Examples
Handling of Technical Terms	How do you recognize domain-specific terms?
Accuracy & Validation Methods	How can you prove the model's performance?
Voice Signal Analysis	What features can be extracted after STT?
Foundation Models	What is the possibility of integrating GPU / datasets / LLM?
Emotion Recognition Performance	How do you validate accuracy?
Adoption Methods	What is the basis for starting a PoC?

There was particularly high interest in deeper signal analysis than simple transcription (STT).

The fact that Audion extracts various features such as emotion, emphasis periods, and speaking patterns from voice signals was naturally recognized as a differentiator from text-based analysis AI.

‘Potential Needs’ of Attendees - Trends from a Business/Product Perspective

The direction of actual needs summarized based on reactions at the event is as follows.

1. Exploring the ‘Insight Layer’ as the Next Step

Conversation summarization is already perceived as ‘step 1’, and now there is clear consideration of “What can we show next?”

“Would it be effective to show that there was this emotion in this sentence?”

2. Request for ‘Quantification’ in VOC and Customer Response Areas

“How much dissatisfaction was there?”, “At what moments did emotional changes occur?”

These questions signal that considerations are directed towards quantitative indicators → UI forms → operational standards.

3. Exploring Technology for Real-time Environments

There was also high interest in the applicability of real-time environments beyond call centers, such as video content, live commerce, user consultations, and in-car conversations.

4. Domain Conversations with Technical Terms

In certain industries, there was a recognition that “accuracy is more important than readability.”

There were also discussions about whether voice signal analysis could complement points that text-based LLMs miss.

5. Barriers to PoC Entry

“Can we just start by testing a few files?” “How are the API usage criteria set for PoC?” Many opinions suggested the need for a ‘lightly testable structure’ to confirm possibilities.

This could lead to a need for initial onboarding templates/sample flows.

At this SK AI Summit, we confirmed that voice AI is moving from ‘recognition’ to ‘understanding’, and this change fundamentally starts with the structural design that supports a Voice Understanding Agent interpreting conversations.

Thank you for reading.