Toy Transformer

Step

Task:

Learning Rate = 0.0005

Embedding Dimension = 48

loading...