Thinking Allowed

medical / technology / education / art / flub

Constructing Transformers For Longer Sequences with Sparse Attention Methods

"We show that carefully designed sparse attention can be as expressive and flexible as the original full attention model. Along with theoretical guarantees, we provide a very efficient implementation which allows us to scale to much longer inputs. As a consequence, we achieve state-of-the-art results for question answering, document summarization and genome fragment classification."




Source: ai.googleblog.com

attention sparse longer fragment answering guarantees sequences constructing