Weight tying is a beloved trick — share the input embedding and output projection, halve your parameters.Continue reading on Medium »
Can orthogonalizing the embedding matrix make weight tying work better?
E·Medium AI··1 min read
M
Continue reading on Medium AI
This article was sourced from Medium AI's RSS feed. Visit the original for the complete story.