medical / technology / education / art / flub
"We show that carefully designed sparse attention can be as expressive and flexible as the original full attention model. Along with theoretical guarantees, we provide a very efficient implementation which allows us to scale to much longer inputs. As a consequence, we achieve state-of-the-art results for question answering, document summarization and genome fragment classification."
Source: ai.googleblog.com
attention sparse longer fragment answering guarantees sequences constructing