Research & Development

About Hitachi R&D

Our leaders - Management

Our leaders - Technology & Design

Publications / Awards

2019：Publications

2019：Awards

Corporate R&D Worldwide : Global

Research & Development sites in Japan

Research Area: Research Area
Research Area
AI & Data science
Digital service platforms
Software & Systems
Electronics
Control systems & Mechatronics
　
Measurement & Analyses
Materials science & Production engineering
Life sciences
Energy
Design & User experience
Application Area
Financial sector
Public sector
Electricity & Power
Mobility
Urban
Healthcare
Manufacturing
Lifestyle

Open Innovation: Open Innovation
Kyōsō-no-Mori

R&D Square: R&D Square
R&D Topics
Behind the scenes
Insights from AI/Analytics
Industrial AI Blog
A new form of co-creation
Backcasting R&D
Event Reports
Videos

News & Events: News & Events
News Release
Events

Careers: Careers
Careers - Research & Development Group

Development of End-to-End Speaker Diarization Method for Detecting Multi-speaker Overlapping Speech

Our method outperformed conventional methods on overlapping speech

February 3, 2020

Hitachi, Ltd. today announced the development of end-to-end^*1 speaker diarization^*2 method that detects speech segments (start and end times) of multiple speakers accurately by using a neural network trained with speaker-overlapping speech. Different from most of the other speaker diarization methods, which cannot handle overlapping speech, this method improves the speech recognition accuracy of overlapping speech in natural conversation. Evaluation results on a telephone speech dataset show that the method outperforms conventional methods. The technique also achieved excellent diarization error rates^*3 on heavily-overlapping simulation speech datasets. Hitachi will aim to tackle the labor shortage and to contribute to the productivity improvement through applying the method to speech recognition and dialogue services.

Fig. 1 Developed end-to-end speaker diarization

Fig. 2 Conventional speaker diarization

*1: End-to-end: A learning method using a single neural network that directly outputs the target results, bypassing a complicated pipeline of systems.
*2: Speaker diarization: The process of detecting multiple speaker segments to answer the question "who spoke when?".
*3: Diarization error rate: The most commonly used metric of speaker diarization. The ratio of falsely missed, falsely detected, or falsely speaker-assigned audio time to total audio time of correct speaker segments.

For more information, use the enquiry form below to contact the Research & Development Group, Hitachi, Ltd. Please make sure to include the title of the article.

https://www8.hitachi.co.jp/inquiry/hitachi-ltd/hqrd/news/en/form.jsp

About Hitachi R&D

Research Area

Open Innovation

R&D Square

News & Events

Careers