A confusion network (sometimes called a word confusion network or informally known as a sausage) is a natural language processing method that combines outputs from multiple automatic speech recognition or machine translation systems.[1][2] Confusion networks are simple linear directed acyclic graphs with the property that each a path from the start node to the end node goes through all the other nodes. The set of words represented by edges between two nodes is called a confusion set. In machine translation, the defining characteristic of confusion networks is that they allow multiple ambiguous inputs, deferring committal translation decisions until later stages of processing.[3][4] This approach is used in the open source machine translation software Moses[5] and the proprietary translation API in IBM Bluemix Watson.[6]
References
- ↑ Rosti, Antti-Veikko I.; Zhang, Bing; Matsoukas, Spyros; Schwartz, Richard (2008). "Incremental Hypothesis Alignment for Building Confusion Networks with Application to Machine Translation System Combination". Proceedings of the Third Workshop on Statistical Machine Translation. StatMT '08. Stroudsburg, PA, USA: Association for Computational Linguistics: 183–186. ISBN 9781932432091.
- ↑ Matusov, Evgeny; Ueffing, Nicola; Ney, Hermann (2006). "Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment". In Proc. EACL. CiteSeerX 10.1.1.483.5417.
- ↑ Hoang, Hieu (2007). "Factored translation models". In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL: 868–876. CiteSeerX 10.1.1.80.3572.
- ↑ Koehn, Philipp; Hoang, Hieu; Birch, Alexandra; Callison-Burch, Chris; Federico, Marcello; Bertoldi, Nicola; Cowan, Brooke; Shen, Wade; Moran, Christine (2007). "Moses: Open Source Toolkit for Statistical Machine Translation". Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. ACL '07. Stroudsburg, PA, USA: Association for Computational Linguistics: 177–180. doi:10.3115/1557769.1557821. S2CID 794019.
- ↑ "Moses - Moses/ConfusionNetworks". www.statmt.org. Retrieved 2017-11-09.
- ↑ "IBM® Speech to Text service provides an API Reference | IBM Watson Developer Cloud". www.ibm.com. Archived from the original on 2017-11-09. Retrieved 2017-11-09.
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.