"We show that carefully designed sparse attention can be as expressive and flexible as the original full attention model. Along with theoretical guarantees, we provide a very efficient implementation which allows us to scale to much longer inputs. As a consequence, we achieve state-of-the-art results...
Source: googleblog.com
Robust Neural Machine Translation: In recent years, neural machine translation (NMT) using Transformer models has experienced tremendous success. Based on deep neural networks, NMT models are usually trained end-to-end on very large parallel corpora (input/output text pairs) in an entirely data-driven...
Source: googleblog.com
Improving Language Understanding with Unsupervised Learning: We've obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we're also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training....
Source: openai.com