From the viewpoint of language modeling (as opposed to reasoning) transformers are absolute genius compared to the CNN and RNN solutions we were trying before. Ultimately they are sensitive to the graph structure which is the truth about language (in two parts of the text we are talking about the same thing) and not the tree structure which is an almost-trust.
PaulHoule•1h ago