Microsofts speech recognition team hit a major milestone that makes speech recognition as good as humans when hearing people speak. In a paper published Monday, researchers and engineers in Microsofts Artificial Intelligence and Research reported a speech recognition system that has a word error rate of 5.9 percent, which is on par with humans that professionally transcribe conversations. “We’ve reached human parity,” said Xuedong Huang, the company’s chief speech scientist. “This is an historic achievement,”in a Microsoft blog post.
“Our progress is a result of the careful engineering and optimization of convolutional and recurrent neural networks,” reads the paper. “These acoustic models have the ability to model a large amount of acoustic context.”
While the system can’t hear as well as humans in all situations or environments, it’s still a major improvement that may show up in future Microsoft products.