Thinking Allowed

medical / technology / education / art / flub

showing posts for 'answering'

Constructing Transformers For Longer Sequences with Sparse Attention Methods

"We show that carefully designed sparse attention can be as expressive and flexible as the original full attention model. Along with theoretical guarantees, we provide a very efficient implementation which allows us to scale to much longer inputs. As a consequence, we achieve state-of-the-art results...
Source: googleblog.com