
At this SK AI Summit, there was a significant increase in interest regarding how to apply technology to our services rather than the technology itself. Audion’s structure-focused explanation was effective, and there were many questions about signal-based analysis (emotion, emphasis periods, speaking patterns) after STT. Attendees showed a strong need for insight layers, quantification of emotional changes, real-time applicability, handling of technical terms, and light PoC onboarding. Overall, it was confirmed that voice AI is transitioning from ‘recognition’ to ‘understanding’, and this change demands a designable structure.
The most prominent feeling at the largest AI event held by SK, the SK AI Summit, was that interest has shifted from curiosity about technology to how it can be applied to actual services or products.
This time, I first introduced the structure of Audion, a voice AI middleware platform, and explained that the emotion recognition and voice highlight features are extensions derived from the API.
When explained in this way, the technology was understood much faster, and the conversation immediately shifted to “What scenarios would be possible if we applied this to our service?” which I found most impressive.
Most Frequently Asked Questions on Site
Question Area | Actual Question Examples |
|---|---|
Handling of Technical Terms | How do you recognize domain-specific terms? |
Accuracy & Validation Methods | How can you prove the model's performance? |
Voice Signal Analysis | What features can be extracted after STT? |
Foundation Models | What is the possibility of integrating GPU / datasets / LLM? |
Emotion Recognition Performance | How do you validate accuracy? |
Adoption Methods | What is the basis for starting a PoC? |
There was particularly high interest in deeper signal analysis than simple transcription (STT).
The fact that Audion extracts various features such as emotion, emphasis periods, and speaking patterns from voice signals was naturally recognized as a differentiator from text-based analysis AI.
‘Potential Needs’ of Attendees - Trends from a Business/Product Perspective
The direction of actual needs summarized based on reactions at the event is as follows.
1. Exploring the ‘Insight Layer’ as the Next Step
Conversation summarization is already perceived as ‘step 1’, and now there is clear consideration of “What can we show next?”
“Would it be effective to show that there was this emotion in this sentence?”
2. Request for ‘Quantification’ in VOC and Customer Response Areas
“How much dissatisfaction was there?”, “At what moments did emotional changes occur?”
These questions signal that considerations are directed towards quantitative indicators → UI forms → operational standards.
3. Exploring Technology for Real-time Environments
There was also high interest in the applicability of real-time environments beyond call centers, such as video content, live commerce, user consultations, and in-car conversations.
4. Domain Conversations with Technical Terms
In certain industries, there was a recognition that “accuracy is more important than readability.”
There were also discussions about whether voice signal analysis could complement points that text-based LLMs miss.
5. Barriers to PoC Entry
“Can we just start by testing a few files?” “How are the API usage criteria set for PoC?” Many opinions suggested the need for a ‘lightly testable structure’ to confirm possibilities.
This could lead to a need for initial onboarding templates/sample flows.
At this SK AI Summit, we confirmed that voice AI is moving from ‘recognition’ to ‘understanding’, and this change fundamentally starts with the structural design that supports a Voice Understanding Agent interpreting conversations.
Thank you for reading.
