Skip to content

Add months mixing#24

Merged
SarahAlidoost merged 20 commits intomainfrom
add_months_attn
Feb 27, 2026
Merged

Add months mixing#24
SarahAlidoost merged 20 commits intomainfrom
add_months_attn

Conversation

@SarahAlidoost
Copy link
Member

@SarahAlidoost SarahAlidoost commented Feb 19, 2026

closes #23

🔴 This branch is based on #20 and is waiting for #19 to modify example notebook

In this PR:

  • I added monthly mixing
  • I improved the dataset
  • I fix the documentation github action
  • I updated the notebook

Todo:

@SarahAlidoost SarahAlidoost marked this pull request as ready for review February 26, 2026 09:15
@SarahAlidoost
Copy link
Member Author

@meiertgrootes the part related to st_encoder_decoder is ready for your review. The example notebook in this branch shows how to run the model. Also, there is code description for most of the code, but still I have to update it to include the month mixing.

@SarahAlidoost
Copy link
Member Author

@rogerkuou I added you as reviewer because in this PR I changed the dataset modules making it faster. But still I have to update the tests. Also, if you are working on #25, it is better to start from this branch.

embed_dim: Dimension of the embedding.The default is 128.
Many vision transformers use embedding dimensions that are multiples
of 64 (e.g., 64, 128, 256). This can be tuned.
max_len: Maximum length of the temporal dimension to precompute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to implement temporal encodings of the same embedding dimension for architecture purposes. A simple sine and cosine would likely suffice though. The underlying assumption to the use is a cyclical nature of the temporal variable w.r.t. the modelled process. That may be debatable. Nevertheless, I believe it makes sense to use/investigate this approach here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach of using sin/cos in encoding the temporal position is based on "Attention Is All You Need", section 3.5 Positional Encoding, page 6. The reason is mentioned as: " the sinusoidal version because it may allow the model to extrapolate to sequence lengths longer than the ones encountered during training." Which I think it is useful in our case! In this section, they also compared this with another approach in Convolutional sequence to sequence learning. We might explore this in future. 🤔

Copy link
Collaborator

@meiertgrootes meiertgrootes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks sound, with the adaption of the final convolutional smoothing and mixing of monthly aggregated information.
See comment about underlying assumptions on encoding time (and by extent spatial position) w.r.t. year(/global reference), but that is for further exploration

Copy link
Collaborator

@rogerkuou rogerkuou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SarahAlidoost. I think it is ready to merge. I tested this PR in #27. I still need to subset data and shrink the patch size to make the training executable in my local. The training process runs smoothly.

@SarahAlidoost SarahAlidoost merged commit b13ef4b into main Feb 27, 2026
3 checks passed
@SarahAlidoost SarahAlidoost deleted the add_months_attn branch February 27, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support seasonal transitions or multi-month patterns

3 participants