Conversation
|
@meiertgrootes the part related to |
|
@rogerkuou I added you as reviewer because in this PR I changed the dataset modules making it faster. But still I have to update the tests. Also, if you are working on #25, it is better to start from this branch. |
| embed_dim: Dimension of the embedding.The default is 128. | ||
| Many vision transformers use embedding dimensions that are multiples | ||
| of 64 (e.g., 64, 128, 256). This can be tuned. | ||
| max_len: Maximum length of the temporal dimension to precompute |
There was a problem hiding this comment.
It makes sense to implement temporal encodings of the same embedding dimension for architecture purposes. A simple sine and cosine would likely suffice though. The underlying assumption to the use is a cyclical nature of the temporal variable w.r.t. the modelled process. That may be debatable. Nevertheless, I believe it makes sense to use/investigate this approach here.
There was a problem hiding this comment.
The approach of using sin/cos in encoding the temporal position is based on "Attention Is All You Need", section 3.5 Positional Encoding, page 6. The reason is mentioned as: " the sinusoidal version because it may allow the model to extrapolate to sequence lengths longer than the ones encountered during training." Which I think it is useful in our case! In this section, they also compared this with another approach in Convolutional sequence to sequence learning. We might explore this in future. 🤔
meiertgrootes
left a comment
There was a problem hiding this comment.
The implementation looks sound, with the adaption of the final convolutional smoothing and mixing of monthly aggregated information.
See comment about underlying assumptions on encoding time (and by extent spatial position) w.r.t. year(/global reference), but that is for further exploration
There was a problem hiding this comment.
Thanks @SarahAlidoost. I think it is ready to merge. I tested this PR in #27. I still need to subset data and shrink the patch size to make the training executable in my local. The training process runs smoothly.
closes #23
🔴 This branch is based on #20 and is waiting for #19 to modify example notebook
In this PR:
Todo: