-
Notifications
You must be signed in to change notification settings - Fork 1k
MX, CENACE use latest settlement CSV, remove proxying #8574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Wait. I see a problem. We need to submit the date to reload the HTML table with correct entries for the month. Currently it's always looking at the settlement buttons for Noviembre 2025. |
b48afec to
7f645b4
Compare
GET URL returns HTML table for latest available date table has multiple rows for settlement, latest should likely be used there are 3 code paths for downloading CSV: Current: GET POST - download latest settlement CSV Historical GET POST - refresh with correct date POST - download latest settlement CSV future/not available GET
7f645b4 to
fd9a91b
Compare
|
Ok this ended up a bigger change that I expected. And I should mention I vibe coded this, I normally don't code in Python. The result was sloppy so I spent a lot of effort cleaning it up, in particular i tried to reduce the diff in files changed tab. I implemented CENACE production importer in Ruby in my own project, and thought I would improve your implementation aswell, also it's an opportunity to learn more about vibe coding. Right, so there's 3 code paths, and URL=https://www.cenace.gob.mx/Paginas/SIM/Reportes/EnergiaGeneradaTipoTec.aspx Latest/default month - 2 requests
Previous/historical month - 3 requests
Not yet available month
So this should use the least number of requests when run periodically to retrieve fresh data. I think everything is good. Would like any feedback as next step. |
|
Also fun fact: some of the first months in 2016 have gaps in the data, gaps spanning a couple days. But on previous settlements the gaps are filled. No idea what this means. In my Ruby project I use VCR to automatically record API responses. There is VCR.py that could be nice to have. It was amazing how AI just figured out how to submit forms with this ASP.NET __VIEWSTATE crap after it initially strongly recommended against it. And I was pulling my hair out for a while when the 2nd POST request to change date and reload the HTML table didn't work. It would download the correct date CSV, but the HTML table was stuck on Noviembre and so it wouldn't find the latest settlement. Managed to coerce the AI to troubleshoot this and come up with parameters that worked after giving it post data from network inspector in my browser. What a time to be alive aye! Hope this help, this repo is a great resource for my https://intermittent.energy project. Probably should have just used your parsers instead of writing my own, but you didn't focus on historical data when I started. And it's been a great excuse for me to code in Ruby. |
|
I'll take a look at the code a bit later but if I remember correctly this part:
Was because sometimes the data would be delayed and it would catch it as soon as it became available very aggressively. But I'm not a huge fan of brute forcing it like that so let's see what we can come up with instead. |
|
Hmm, seems to me the latest month available is shown by default, new data seems to arrive between day 10-14 each month per settlement datetimes for previous months, maybe depending on how workdays align to the month. You're right there was probably some good reason for the existing behaviour, I'll try to find some time to dig thru the history later. |
|
The changes in this PR is best explained by pseudo code. I wish my diff was this clear. def fetch_production(requested_month)
get_initial_page()
actual_month = parse_date_from_html()
if requested_month > actual_month
raise 'data not yet available'
elsif requested_month < actual_month
submit_form(requested_month) # to reload HTML for correct month
actual_month = parse_date_from_html()
raise if requested_month != actual_month # assert HTML shows correct month
last_button_name = find_last_csv_button # in table showing all settlements for month
submit_form(last_button_name) # download CSV file for last settlementAnd I also split out 2 separate commits: remove proxying and remove unused parsed_date function. I'm not sure how backend (re)fetches historical data, which is maybe issue with this update. I suspect that is logic is not part of this electrictymaps-contrib repo? |
|
Hi! I'm not sure, what is the original issue with the script that you are trying to solve with this PR? Proxy was needed because access was blocked to connections originating from Mexico only. Not sure if it's still the case, proxy don't hurt anyway I guess? The logic for the fetch was because at the start of a month, until the csv are published, the script wasn't working so we came up with a workaround. I remember discussing this with @VIKTORVAV99 on a PR or Slack (back in time) Hope it gives more context |
|
No specific issue prompted this PR. In my own project I tried to add CENACE ~1.5 years ago - ended up frustrated (by language barrier, geoblock, __VIEWSTATE, multiple settlements, etc) and pausing it. Now with AI tooling I've spurred to retry old tasks. I recall not being able to download CSV or ZIP files, now I can do it - there's no geoblock anymore. proxy doesn't hurt really no. Seems to me that the latest settlement per month should be used, not the 1st/oldest as existing. Idk how your backend calls the importers, maybe latest months CSV (shown in HTML on intial page load) should be default behaviour. |
|
The backend calls the parsers in two ways:
|
|
Ah continuous mode must be In this mode it should just download latest settlement CSV on https://www.cenace.gob.mx/Paginas/SIM/Reportes/EnergiaGeneradaTipoTec.aspx (no date switching needed). I think. New files seem to be published on the 14th. So I'm looking forward to seeing dec data, it's 11th today. |
URL https://www.cenace.gob.mx/Paginas/SIM/Reportes/EnergiaGeneradaTipoTec.aspx
Each date has multiple settlement CSVs. I assume the latest one should always be used? Current parser always uses the first.
Also can now download reports without proxy. A welcome change, it didn't work last time I tried it.
This seems correct to me, tho I am not an expert on CENACE data source and have to make do with auto translate from Spanish.
Double check
poetry run test_parser --target_datetime 2025-11-01 MXpoetry run formatin the top level directory to format my changes.