Skip to content

Conversation

@intermittentnrg
Copy link
Contributor

@intermittentnrg intermittentnrg commented Jan 4, 2026

URL https://www.cenace.gob.mx/Paginas/SIM/Reportes/EnergiaGeneradaTipoTec.aspx

Each date has multiple settlement CSVs. I assume the latest one should always be used? Current parser always uses the first.

Also can now download reports without proxy. A welcome change, it didn't work last time I tried it.

This seems correct to me, tho I am not an expert on CENACE data source and have to make do with auto translate from Spanish.

Double check

  • I have tested my parser changes locally with poetry run test_parser --target_datetime 2025-11-01 MX
  • I have run poetry run format in the top level directory to format my changes.

@intermittentnrg intermittentnrg requested a review from a team as a code owner January 4, 2026 11:58
@github-actions github-actions bot added the python Pull requests that update Python code label Jan 4, 2026
@VIKTORVAV99 VIKTORVAV99 self-assigned this Jan 4, 2026
@VIKTORVAV99 VIKTORVAV99 self-requested a review January 4, 2026 14:16
@intermittentnrg
Copy link
Contributor Author

Wait. I see a problem. We need to submit the date to reload the HTML table with correct entries for the month.

Currently it's always looking at the settlement buttons for Noviembre 2025.

@intermittentnrg intermittentnrg marked this pull request as draft January 5, 2026 03:53
GET URL returns HTML table for latest available date
table has multiple rows for settlement, latest should likely be used
there are 3 code paths for downloading CSV:

Current:
GET
POST - download latest settlement CSV

Historical
GET
POST - refresh with correct date
POST - download latest settlement CSV

future/not available
GET
@intermittentnrg
Copy link
Contributor Author

Ok this ended up a bigger change that I expected. And I should mention I vibe coded this, I normally don't code in Python. The result was sloppy so I spent a lot of effort cleaning it up, in particular i tried to reduce the diff in files changed tab.
Maybe should take a step back and just consider readability, but also any advice from more experienced python devs will be appreciated and probably the most valuable.

I implemented CENACE production importer in Ruby in my own project, and thought I would improve your implementation aswell, also it's an opportunity to learn more about vibe coding.

Right, so there's 3 code paths, and URL=https://www.cenace.gob.mx/Paginas/SIM/Reportes/EnergiaGeneradaTipoTec.aspx

Latest/default month - 2 requests

poetry run test_parser --target_datetime 2025-11-01 MX

  1. GET URL - requested month matches month in table (Noviembre 2025)
  2. POST URL - click CSV image button in last row

Previous/historical month - 3 requests

poetry run test_parser --target_datetime 2025-01-01 MX

  1. GET URL - requested month before month in table
  2. POST URL - refresh HTML table, month in table Enero 2025
  3. POST URL - click CSV image button in last row

Not yet available month

poetry run test_parser --target_datetime 2026-01-01 MX

  1. GET URL - requested month is after month in table - raise ParserException data not yet available.

So this should use the least number of requests when run periodically to retrieve fresh data.
Also previous implementation iterated up to 6 months backwards if no data was found. Strange. I'm not sure if raising ParserException instead when no data is fine?

I think everything is good. Would like any feedback as next step.

@intermittentnrg
Copy link
Contributor Author

Also fun fact: some of the first months in 2016 have gaps in the data, gaps spanning a couple days. But on previous settlements the gaps are filled. No idea what this means.

In my Ruby project I use VCR to automatically record API responses. There is VCR.py that could be nice to have.

It was amazing how AI just figured out how to submit forms with this ASP.NET __VIEWSTATE crap after it initially strongly recommended against it. And I was pulling my hair out for a while when the 2nd POST request to change date and reload the HTML table didn't work. It would download the correct date CSV, but the HTML table was stuck on Noviembre and so it wouldn't find the latest settlement. Managed to coerce the AI to troubleshoot this and come up with parameters that worked after giving it post data from network inspector in my browser. What a time to be alive aye!

Hope this help, this repo is a great resource for my https://intermittent.energy project. Probably should have just used your parsers instead of writing my own, but you didn't focus on historical data when I started. And it's been a great excuse for me to code in Ruby.

@VIKTORVAV99
Copy link
Member

I'll take a look at the code a bit later but if I remember correctly this part:

So this should use the least number of requests when run periodically to retrieve fresh data.
Also previous implementation iterated up to 6 months backwards if no data was found. Strange. I'm not sure if raising ParserException instead when no data is fine?

Was because sometimes the data would be delayed and it would catch it as soon as it became available very aggressively. But I'm not a huge fan of brute forcing it like that so let's see what we can come up with instead.

@intermittentnrg
Copy link
Contributor Author

Hmm, seems to me the latest month available is shown by default, new data seems to arrive between day 10-14 each month per settlement datetimes for previous months, maybe depending on how workdays align to the month.

You're right there was probably some good reason for the existing behaviour, I'll try to find some time to dig thru the history later.

@intermittentnrg
Copy link
Contributor Author

Code to fall back to previous months was added in #7334 by @tuxity

@intermittentnrg
Copy link
Contributor Author

intermittentnrg commented Jan 6, 2026

The changes in this PR is best explained by pseudo code. I wish my diff was this clear.

def fetch_production(requested_month)
  get_initial_page()
  actual_month = parse_date_from_html()

  if requested_month > actual_month
    raise 'data not yet available'

  elsif requested_month < actual_month
    submit_form(requested_month) # to reload HTML for correct month
    actual_month = parse_date_from_html()

  raise if requested_month != actual_month # assert HTML shows correct month

  last_button_name = find_last_csv_button # in table showing all settlements for month
  submit_form(last_button_name) # download CSV file for last settlement

And I also split out 2 separate commits: remove proxying and remove unused parsed_date function.

I'm not sure how backend (re)fetches historical data, which is maybe issue with this update. I suspect that is logic is not part of this electrictymaps-contrib repo?

@tuxity
Copy link
Contributor

tuxity commented Jan 6, 2026

Hi!

I'm not sure, what is the original issue with the script that you are trying to solve with this PR?

Proxy was needed because access was blocked to connections originating from Mexico only. Not sure if it's still the case, proxy don't hurt anyway I guess?

The logic for the fetch was because at the start of a month, until the csv are published, the script wasn't working so we came up with a workaround. I remember discussing this with @VIKTORVAV99 on a PR or Slack (back in time)

Hope it gives more context

@intermittentnrg
Copy link
Contributor Author

intermittentnrg commented Jan 6, 2026

No specific issue prompted this PR. In my own project I tried to add CENACE ~1.5 years ago - ended up frustrated (by language barrier, geoblock, __VIEWSTATE, multiple settlements, etc) and pausing it. Now with AI tooling I've spurred to retry old tasks.
I think very highly of this repo and want to contribute improvements back.

I recall not being able to download CSV or ZIP files, now I can do it - there's no geoblock anymore. proxy doesn't hurt really no.

Seems to me that the latest settlement per month should be used, not the 1st/oldest as existing.

Idk how your backend calls the importers, maybe latest months CSV (shown in HTML on intial page load) should be default behaviour.

@VIKTORVAV99
Copy link
Member

The backend calls the parsers in two ways:

  • Continuous mode: The parsers are called without a target datetime and it's up to the parsers to return the most recent data possible. This is run every 20 min.

  • Refetch mode: Called with target datetimes and expects the parsers to return that target + all data in the listed refetch frequency if it has one (usually target datetime - 2 days). This schedule runs less often but looks back at multiple different horizions like -1 week, -1 month, -3 months etc to automatically correct faulty data if it has been updated.

@intermittentnrg
Copy link
Contributor Author

intermittentnrg commented Jan 11, 2026

Ah continuous mode must be if target_datetime is None that I see in every parser.

In this mode it should just download latest settlement CSV on https://www.cenace.gob.mx/Paginas/SIM/Reportes/EnergiaGeneradaTipoTec.aspx (no date switching needed). I think.

New files seem to be published on the 14th. So I'm looking forward to seeing dec data, it's 11th today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants