From a9685dbc942e37473c6203ad8663ba3a7576eea9 Mon Sep 17 00:00:00 2001 From: Ian Tse Date: Wed, 15 Apr 2026 10:38:47 -0700 Subject: [PATCH] Clean up docs --- docs/api.md | 2 +- docs/faq.md | 4 +- docs/getting-started/index.md | 2 +- docs/getting-started/quick-start-examples.md | 4 +- .../workflow-1-existing-api-files.md | 6 +- .../workflow-2-pvplant-builder.md | 12 +- .../workflow-3-plantbuilder-advanced.md | 6 +- docs/index.md | 2 +- sdk_general_feedback.md | 296 +++++++++++++++ sdk_round2_feedback.md | 93 +++++ sdk_weather_file_feedback.md | 352 ++++++++++++++++++ solarfarmer/weather.py | 4 +- 12 files changed, 762 insertions(+), 21 deletions(-) create mode 100644 sdk_general_feedback.md create mode 100644 sdk_round2_feedback.md create mode 100644 sdk_weather_file_feedback.md diff --git a/docs/api.md b/docs/api.md index ac0f907..d2513d4 100644 --- a/docs/api.md +++ b/docs/api.md @@ -101,7 +101,7 @@ These are the primary functions for interacting with the SolarFarmer API. ### `TSV_COLUMNS` -Data dictionary describing the SolarFarmer TSV weather file format β€” required and optional columns, units, valid ranges, aliases, and the missing-value sentinel. See the [`weather` module docstring](../api.md) for full details. +Data dictionary describing the SolarFarmer TSV weather file format: required and optional columns, units, valid ranges, aliases, and the missing-value sentinel. See the [`weather` module docstring](../api.md) for full details. --- diff --git a/docs/faq.md b/docs/faq.md index 547fc07..b336569 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -52,6 +52,6 @@ The core SDK (payload construction, API calls, annual/monthly summary data) work ## My TMY weather file gives a 400 error with no useful message. What's wrong? -TMY (Typical Meteorological Year) datasets from NSRDB, PVGIS, or similar sources contain timestamps from multiple source years. SolarFarmer requires all timestamps in a TSV file to belong to a single calendar year. +TMY (Typical Meteorological Year) datasets from NSRDB, PVGIS, or similar sources contain timestamps drawn from different source years, causing them to go backwards in time when sorted as a full year. SolarFarmer requires timestamps in a TSV file to be chronologically ordered (non-decreasing years). Multi-year and sub-year files with continuous, ordered timestamps are fully supported; only shuffled years are rejected. -Use [`sf.from_pvlib()`](api.md#from_pvlib) or [`sf.from_dataframe(year=1990)`](api.md#from_dataframe) to remap timestamps automatically. The SDK also calls [`check_sequential_year_timestamps()`](api.md#check_sequential_year_timestamps) before upload to catch this early. +Use [`sf.from_pvlib()`](api.md#from_pvlib) or [`sf.from_dataframe(year=1990)`](api.md#from_dataframe) to remap all timestamps to a single year before export. The SDK also calls [`check_sequential_year_timestamps()`](api.md#check_sequential_year_timestamps) before upload to catch this early. diff --git a/docs/getting-started/index.md b/docs/getting-started/index.md index 4f33041..58cae68 100644 --- a/docs/getting-started/index.md +++ b/docs/getting-started/index.md @@ -63,7 +63,7 @@ The SolarFarmer SDK supports three distinct user workflows. Choose the one that Specify your plant: location (lat/lon), DC and AC capacities, inverter type, mounting configuration. `PVSystem` handles the payload construction and you run the calculation. -Results from `PVSystem` are approximations based on simplified layout assumptions β€” see [FAQ](../faq.md) for details. +Results from `PVSystem` are approximations based on simplified layout assumptions. See [FAQ](../faq.md) for details. --- diff --git a/docs/getting-started/quick-start-examples.md b/docs/getting-started/quick-start-examples.md index 28ef4bb..ffe9e25 100644 --- a/docs/getting-started/quick-start-examples.md +++ b/docs/getting-started/quick-start-examples.md @@ -49,7 +49,7 @@ print(f"Net Energy: {net_energy:.1f} MWh") print(f"Performance Ratio: {performance_ratio:.1%}") # Get loss tree time-series results -loss_tree_timeseries = results.loss_tree_timeseries +loss_tree_timeseries = results.loss_tree_timeseries() # Print summary results.print_annual_results() @@ -466,7 +466,7 @@ for project in projects: with open(output_file, 'w') as f: f.write(payload_json) - print(f"βœ“ Generated payload for {project['name']}") + print(f"Generated payload for {project['name']}") print(f"\nGenerated {len(projects)} payloads for batch submission") ``` diff --git a/docs/getting-started/workflow-1-existing-api-files.md b/docs/getting-started/workflow-1-existing-api-files.md index 2d41a92..a3286f6 100644 --- a/docs/getting-started/workflow-1-existing-api-files.md +++ b/docs/getting-started/workflow-1-existing-api-files.md @@ -23,7 +23,7 @@ This workflow involves three steps: ```mermaid graph TB - A["πŸ“ Existing API Files
(JSON, PAN, OND, weather)"] + A["Existing API Files
(JSON, PAN, OND, weather)"] B["run_energy_calculation()"] C["ModelChainResponse"] D["CalculationResults
(Time-series, losses, metrics)"] @@ -180,7 +180,7 @@ For detailed documentation on loss tree time-series results, refer to the [Loss ```python # Get loss tree timeseries results -loss_tree_timeseries = results.loss_tree_timeseries +loss_tree_timeseries = results.loss_tree_timeseries() ``` @@ -188,7 +188,7 @@ For detailed documentation on PVsyst-format time-series results, refer to the [P ```python # Get pvsyst-format time-series results -timeseries_data = results.pvsyst_timeseries +timeseries_data = results.pvsyst_timeseries() ``` diff --git a/docs/getting-started/workflow-2-pvplant-builder.md b/docs/getting-started/workflow-2-pvplant-builder.md index 8c5b4cb..beca60e 100644 --- a/docs/getting-started/workflow-2-pvplant-builder.md +++ b/docs/getting-started/workflow-2-pvplant-builder.md @@ -14,7 +14,7 @@ description: Create plant configurations and automatically build API payloads ## Overview !!! info "Approximated Design" - `PVSystem` constructs a plant layout from high-level inputs (location, capacity, equipment files) using simplified assumptions β€” including uniform mid-row shading for all strings and inferred string sizing. This is well-suited for early-stage yield screening and scenario comparison. For full design fidelity, use [Workflow 1](workflow-1-existing-api-files.md) with a SolarFarmer Desktop–exported payload or [Workflow 3](workflow-3-plantbuilder-advanced.md) for direct data model mapping. + `PVSystem` constructs a plant layout from high-level inputs (location, capacity, equipment files) using simplified assumptions, including uniform mid-row shading for all strings and inferred string sizing. This is well-suited for early-stage yield screening and scenario comparison. For full design fidelity, use [Workflow 1](workflow-1-existing-api-files.md) with a SolarFarmer Desktop-exported payload or [Workflow 3](workflow-3-plantbuilder-advanced.md) for direct data model mapping. This workflow involves four steps: @@ -27,7 +27,7 @@ This workflow involves four steps: ```mermaid graph TB - A["πŸ“‹ Define PVSystem
(location, capacity, equipment)"] + A["Define PVSystem
(location, capacity, equipment)"] B["PVSystem Payload
(automatic payload construction)"] C["run_energy_calculation()
(submit to API)"] D["CalculationResults
(analyze performance)"] @@ -126,7 +126,7 @@ plant.bifacial_mismatch_loss = 0.01 ## Step 3: Add Module and Inverter Files The SDK uses PAN (module) and OND (inverter) files for detailed specifications. -Dict keys are user-facing labels only β€” the spec ID sent to the API is derived from the filename (everything before the last `.`). +Dict keys are user-facing labels only; the spec ID sent to the API is derived from the filename (everything before the last `.`). ```python from pathlib import Path @@ -165,7 +165,7 @@ plant.horizon_file = Path(r"path/to/horizon_data.hor") # Or specify horizon angles directly plant.horizon( - elevation_angles=[0, 5, 10, 15, 20, 25, 30], + elevation_angles=[0, 5, 10, 15, 10, 5, 0, 0], azimuth_angles=[0, 45, 90, 135, 180, 225, 270, 315] ) ``` @@ -256,8 +256,8 @@ annual_data = plant.results.AnnualData[0] net_energy = annual_data['energyYieldResults']['netEnergy'] performance_ratio = annual_data['energyYieldResults']['performanceRatio'] -print(f"Net Annual Energy: {net_energy} MWh") -print(f"Performance Ratio: {performance_ratio}%") +print(f"Net Annual Energy: {net_energy:.1f} MWh") +print(f"Performance Ratio: {performance_ratio:.1%}") ``` --- diff --git a/docs/getting-started/workflow-3-plantbuilder-advanced.md b/docs/getting-started/workflow-3-plantbuilder-advanced.md index 5033169..4d84268 100644 --- a/docs/getting-started/workflow-3-plantbuilder-advanced.md +++ b/docs/getting-started/workflow-3-plantbuilder-advanced.md @@ -24,7 +24,7 @@ This workflow gives you complete control over API object construction: ```mermaid graph TB - A["πŸ—„οΈ Your Data Model
(database, file, API, etc.)"] + A["Your Data Model
(database, file, API, etc.)"] B["Mapper/Builder"] C["EnergyCalculationInputs + PVPlant
+ Component Classes
(Inverter, Layout, Location, etc.)"] D["SolarFarmer API"] @@ -390,7 +390,7 @@ design = workflow.design_and_optimize(base_config={...}) ## Debugging and Validation -All SDK component classes are Pydantic models, so invalid field types, out-of-range values, and missing required fields raise a `ValidationError` at construction time β€” before any serialization or API call occurs. +All SDK component classes are Pydantic models, so invalid field types, out-of-range values, and missing required fields raise a `ValidationError` at construction time, before any serialization or API call occurs. ```python from pydantic import ValidationError @@ -408,7 +408,7 @@ except ValidationError as e: This means that if construction of your `EnergyCalculationInputs` object completes without raising, the required structure and field constraints are already satisfied. !!! warning - Unknown keyword arguments are silently ignored β€” Pydantic does not enforce `extra='forbid'` on these models. A misspelled field name will not raise locally; the value will simply be absent from the serialized payload. The server-side validation service is the safeguard for those cases. + Unknown keyword arguments are silently ignored; Pydantic does not enforce `extra='forbid'` on these models. A misspelled field name will not raise locally; the value will simply be absent from the serialized payload. The server-side validation service is the safeguard for those cases. !!! note All energy calculation API calls are validated upon receipt by the [SolarFarmer API Validation Service](https://mysoftware.dnv.com/download/public/renewables/solarfarmer/manuals/latest/WebApi/Troubleshooting/ValidationService.html){ target="_blank" .external }. This provides an additional layer of error detection and reporting. diff --git a/docs/index.md b/docs/index.md index 52011ce..d61eca6 100644 --- a/docs/index.md +++ b/docs/index.md @@ -4,7 +4,7 @@ description: SolarFarmer Python SDK --- # Welcome to SolarFarmer API -A Python SDK that wraps the [DNV SolarFarmer API](https://mysoftware.dnv.com/download/public/renewables/solarfarmer/manuals/latest/WebApi/Introduction/introduction.html) to help you run cloud-based energy calculations and manage API payloads in a user-friendly way. +A Python SDK that wraps the [DNV SolarFarmer API](https://mysoftware.dnv.com/download/public/renewables/solarfarmer/manuals/latest/WebApi/Introduction/introduction.html) to run cloud-based energy calculations and manage API payloads. [New to SolarFarmer? Discover it here!](https://www.dnv.com/software/services/solarfarmer/){ .md-button .md-button-primary target="_blank" .external } diff --git a/sdk_general_feedback.md b/sdk_general_feedback.md new file mode 100644 index 0000000..964015d --- /dev/null +++ b/sdk_general_feedback.md @@ -0,0 +1,296 @@ +# SolarFarmer Python SDK β€” General Usability Feedback + +## Context + +This document captures non-weather-file observations from building a batch +energy-yield runner on top of `dnv-solarfarmer>=0.2.0rc1`. Weather file +feedback is covered separately in `sdk_weather_file_feedback.md`. + +The implementation was driven by an AI coding agent (Claude) working +interactively with a human engineer. Both perspectives inform the +observations below. + +--- + +## 1 Error reporting from the API + +### 1.1 `run_energy_calculation` returns `None` on failure + +When the API returns an HTTP 400 or other error, `run_energy_calculation()` +returns `None` with no exception raised and no error message surfaced to the +caller. The HTTP response body often contains useful validation messages, +but these are swallowed. + +**How debugging actually worked in practice:** + +1. `run_energy_calculation()` returned `None`. No exception, no stderr, no + log output β€” nothing to indicate what went wrong. +2. To make progress, we had to enable the SDK's logger at `ERROR` level. + This revealed HTTP 400 status codes with some error detail β€” but only + the subset that `_log_api_failure` manages to extract (see Β§1.3 below). +3. For errors where the logged detail was just `"Something went wrong."`, + we had to monkey-patch the HTTP client to intercept raw responses. +4. The error bodies contained clear, actionable validation messages β€” e.g. + *"The rack height specified in Mounting Type Specifications 'sat_mount' + is smaller than the module specifications"* and *"The mounting + specification is missing one or several of the required bifacial + properties."* These were immediately fixable once visible. + +The API's validation layer is doing good work β€” the messages it produces are +specific and actionable. The problem is entirely on the SDK side: those +messages are discarded and the caller gets `None`. + +**Suggestion:** On non-2xx responses, raise a dedicated exception (e.g. +`SolarFarmerAPIError`) that includes the status code, error message, and +the full `ProblemDetails` body. The `Response` object already contains all +of this information (`code`, `exception`, `problem_details_json`) β€” it just +needs to be surfaced to the caller rather than logged and discarded. + +This is the highest-impact change in this document β€” every issue described +in Β§Β§2–4 and Β§Β§7–8 below was ultimately diagnosed through the API's own +error messages, but only after manually bypassing the SDK to read them. + +### 1.2 `PVSystem.run_energy_calculation` return type mismatch + +`PVSystem.run_energy_calculation()` has a return annotation of `-> None` +and always executes `return None`, storing the result in `self.results` +instead. However, its docstring says: + +> Returns: CalculationResults + +This should either return the result (matching the docstring) or the +docstring should be corrected and the `self.results` pattern documented. +Returning `None` while the docstring promises `CalculationResults` will +mislead both human developers and AI agents. + +### 1.3 `_log_api_failure` silently swallows parsing errors + +`_log_api_failure` (in `endpoint_modelchains.py`) wraps its detail- +extraction logic in a bare `except Exception: pass`. This means that if +`problem_details_json` isn't in the expected shape, the error detail is +silently lost β€” the developer sees only the generic `Failure message` line, +not the title, errors, or detail fields. + +This is compounded by a type mismatch: `Response.problem_details_json` is +annotated as `str | None`, but `Client._make_request` assigns it the return +value of `response.json()` (a `dict`) on success, or `""` (empty string) on +failure. When it's an empty string, `json_response.get("errors")` raises +`AttributeError`, which is caught and silently discarded. + +**Suggestion:** Fix the type annotation to `dict | None`, handle the empty- +string fallback as `None`, and replace the bare `except` with specific +exception handling (or at least log the parsing error at `DEBUG` level). + +--- + +## 2 Spec ID ↔ filename coupling + +The `moduleSpecificationID` and `inverterSpecID` fields in `Layout` and +`Inverter` must exactly match the **filename stems** (without extension) of +the uploaded PAN/OND files. For example, uploading +`Trina_TSM-DEG19C.20-550_APP.PAN` means the spec ID must be +`Trina_TSM-DEG19C.20-550_APP`. + +This coupling is not documented in the field docstrings or in +`run_energy_calculation`. A developer's natural instinct is to use a +descriptive ID (e.g. `trina_550`), which produces an opaque validation error. + +**Suggestion:** Document the filename-stem convention in the +`moduleSpecificationID` and `inverterSpecID` field docstrings. Alternatively, +the SDK could infer spec IDs from filenames when the user doesn't explicitly +set them, or validate the match client-side before uploading. + +--- + +## 3 Bifacial module auto-detection + +When a PAN file contains a non-zero `BifacialityFactor`, the API requires +three additional properties on the `MountingTypeSpecification`: + +- `transmissionFactor` +- `bifacialShadeLossFactor` +- `bifacialMismatchLossFactor` + +If these are omitted, the API returns a validation error. However: + +1. The SDK does not document this dependency between PAN file contents and + mounting spec fields. +2. The `MountingTypeSpecification` model does not indicate which fields + become required for bifacial modules. +3. There is no client-side validation or warning. + +**Suggestion:** Add a note in the `MountingTypeSpecification` docstring +explaining that these three fields are required when the module is bifacial. +The PAN file's `BifacialityFactor` value could also be mentioned as the +trigger. + +--- + +## 4 Rack height validation + +The `rackHeight` field on `MountingTypeSpecification` must be strictly +greater than the physical height of the module (as specified in the PAN file). +For a portrait-oriented single module high, this means `rackHeight` > +module long dimension (e.g. 2.384 m for the Trina TSM-DEG19C.20-550). + +Setting `rackHeight` equal to the module height triggers: +*"The rack height specified in Mounting Type Specifications 'sat_mount' is +smaller than the module specifications."* + +This is reasonable validation, but the relationship between `rackHeight`, +`numberOfModulesHigh`, `modulesAreLandscape`, and the PAN file's physical +dimensions is not documented. A formula or note in the docstring would help: + +``` +Minimum rackHeight = numberOfModulesHigh Γ— module_dimension + + (numberOfModulesHigh - 1) Γ— ySpacingBetweenModules +where module_dimension = module height if portrait, width if landscape +``` + +--- + +## 5 `MonthlyAlbedo` constructor + +The `MonthlyAlbedo` model accepts a `values` parameter (a list of 12 floats), +not a `monthlyAlbedo` parameter. This is discoverable via `help()` but the +parameter name is unintuitive β€” a developer might expect `monthlyAlbedo` or +`albedo_values` given the class name. + +Minor point, but mentioning it because an AI agent tried `monthlyAlbedo=` +first (matching the class name pattern) and had to fall back to inspecting +the signature. + +--- + +## 6 `EnergyCalculationOptions.calculationYear` + +This field defaults to `1990`. For TMY data (which contains mixed years by +definition), the relationship between `calculationYear` and the timestamps +in the weather file is unclear. Does the API remap all timestamps to this +year? Does it filter? Does it ignore the year in TMY timestamps? + +**Suggestion:** A one-line docstring note explaining how `calculationYear` +interacts with TMY data would prevent confusion. + +--- + +## 7 `gridConnectionLimit` units: docstring says MW, API expects Watts + +The `PVPlant.grid_connection_limit` field docstring reads: + +> Maximum power that can be exported from the transformers in MW + +However, the API expects this value in **Watts**. Setting +`gridConnectionLimit=17.6` (intending 17.6 MW) causes the API to return +an opaque HTTP 400 with `"detail": "Something went wrong."` β€” no +indication that the value is being interpreted as 17.6 W. + +The `PVSystem` builder correctly converts with `grid_limit_MW * 1e6` +(line 1111 of `pvsystem.py`), confirming the API expects Watts. + +**Suggestion:** Fix the docstring to say "in Watts" (or "in W"), not "in +MW". + +--- + +## 8 `TransformerSpecification` is required but marked optional + +Both `Transformer.transformer_spec_id` (type `str | None`) and +`PVPlant.transformer_specifications` (type `dict | None`) are marked as +optional in the Pydantic models. Their docstrings give no indication +that the API requires them: + +- `Transformer.transformer_spec_id`: *"Reference to a transformer + specification"* +- `PVPlant.transformer_specifications`: *"Transformer specs keyed by ID"* + +Omitting both fields causes the API to return HTTP 400 with +`"detail": "Something went wrong."` β€” the same opaque error as the +grid-limit issue. There is no validation error message indicating +what's missing. + +The `PVSystem` builder always creates a `NoLoadAndOhmic` transformer +specification, confirming it's required in practice. + +**Suggestion:** Either: +1. Change the field types from `Optional` to required, or add a + client-side validator on `PVPlant` that raises a clear error when + a transformer references a spec ID not present in + `transformer_specifications`. +2. At minimum, update the docstrings to say these fields are + required by the API despite being structurally optional in the model. + +--- + +## 9 `PVSystem` builder uses PAN-internal module name, not filename stem + +The `PVSystem` builder reads the module name from inside the PAN file +content (e.g. `Trina_TSM-DEG19C` from the PVsyst model identifier) and +uses it as the `moduleSpecificationID`. However, the API requires this +ID to match the **filename stem** of the uploaded PAN file (e.g. +`Trina_TSM-DEG19C.20-550_APP`). + +This means `PVSystem.run_energy_calculation()` fails with: + +> *The given key 'Trina_TSM-DEG19C' was not present in the dictionary.* + +The workaround is to rename the PAN file so its stem matches the internal +model name, but this is fragile and non-obvious. + +**Suggestion:** The builder should use the filename stem (from the +`pan_files` dict key or the file path) as the spec ID, not the PAN file's +internal model identifier. + +--- + +## 10 `calculationYear` silently discards TMY data from non-matching years + +`EnergyCalculationOptions.calculationYear` defaults to `1990`. When the +weather file contains TMY data with mixed years (as produced by NSRDB, +PVGIS, and other TMY generators), the API processes **only** rows whose +timestamp year matches `calculationYear`. For NSRDB PSM4 TMY data, which +uses different years per month (e.g. Jan=2000, Feb=2003, …), this results +in partial-year calculations with dramatically wrong results β€” and no +warning or error. + +In our case, the API silently processed 744 hours (January 2000 only) +instead of 8760, producing βˆ’9.18 kWh/kWp and a Performance Ratio of +βˆ’0.12. There was no error message indicating that 92% of the data was +discarded. + +**Workaround:** Remap all TMY timestamps to year 1990 before writing the +weather file. + +**Suggestion:** +1. Document the interaction between `calculationYear` and TMY timestamps + prominently β€” this is not a minor detail, it determines whether the + calculation produces valid results. +2. Consider adding a warning when the number of processed timesteps is + significantly less than expected (e.g. <8000 for hourly data). +3. Consider a `TMY` mode or `auto` setting for `calculationYear` that + either ignores the year component or remaps automatically. + +--- + +## 11 Summary of priorities + +| # | Issue | Effort | Impact | +|---|-------|--------|--------| +| 1.1 | Raise exception on API failure instead of returning `None` | Small–Medium | **Critical** β€” affects every misconfiguration scenario | +| 1.3 | Fix `_log_api_failure` type mismatch and bare `except` | Trivial | **Critical** β€” causes silent loss of error details | +| 1.2 | Fix `PVSystem.run_energy_calculation` return type | Trivial | Medium β€” docstring contradicts code | +| 7 | Fix `gridConnectionLimit` docstring (MW β†’ W) | Trivial | **Critical** β€” causes opaque failure with no diagnostic | +| 8 | Document or require `TransformerSpecification` | Trivial–Small | **High** β€” another opaque 400 error | +| 10 | Document/fix `calculationYear` + TMY year handling | Small | **High** β€” silently produces wrong results | +| 9 | Fix `PVSystem` module spec ID (PAN name vs filename) | Small | **High** β€” `PVSystem` builder fails on common PAN filenames | +| 2 | Document spec ID ↔ filename stem convention | Trivial | High β€” prevents a common first-use error | +| 3 | Document bifacial mounting property requirements | Trivial | High β€” prevents a non-obvious validation failure | +| 4 | Document rack height formula / constraints | Trivial | Medium β€” the error message is helpful but prevention is better | +| 5 | `MonthlyAlbedo` parameter naming | Trivial | Low β€” minor discoverability improvement | +| 6 | Document `calculationYear` + TMY interaction | Trivial | Medium β€” subsumed by Β§10 above | + +Items 7, 8, and 10 were the **actual blockers** that prevented a working +API call. Items 7 and 8 both produce the same opaque `"Something went +wrong."` error, making them indistinguishable without intercepting raw HTTP +responses. Item 10 produces silently wrong results with no error at all β€” +arguably worse than a failure. diff --git a/sdk_round2_feedback.md b/sdk_round2_feedback.md new file mode 100644 index 0000000..53458cc --- /dev/null +++ b/sdk_round2_feedback.md @@ -0,0 +1,93 @@ +# SolarFarmer Python SDK β€” Feedback from Agent-Driven Integration Test + +**Date:** 2025-04-07 +**SDK version:** git commit `4f13c6e` (github.com/dnv-opensource/solarfarmer-python-sdk) +**Test:** ~20 MW bifacial plants (tracker + fixed tilt) at 7 NOAA SURFRAD locations using PSM4 TMY data via pvlib + +## Context + +An AI coding agent was given a one-paragraph prompt to build and run SolarFarmer energy calculations for all SURFRAD sites. This document summarizes the friction points encountered, ranked by debugging cost, with suggested improvements. + +--- + +## Immediate (docs + bug fixes) + +### 1. PAN/OND Filename Parsing Inconsistency β€” Bug Fix + +**Problem:** `pvsystem.py:1737` (PAN) and `pvsystem.py:1710` (OND) use `filename.split(".")[0]` to derive spec IDs, producing `Trina_TSM-DEG19C` from `Trina_TSM-DEG19C.20-550_APP.PAN`. But the client-side validation in `endpoint_modelchains.py:58` uses `Path(f.name).stem`, producing `Trina_TSM-DEG19C.20-550_APP`. These don't match, causing a `ValueError` before the API call is even made. Both PAN and OND paths are affected. + +**Cost:** ~25% of debugging time. Required source-diving into both files to understand the mismatch. Workaround was copying the PAN file with periods replaced by underscores. + +**Fix:** Use `Path.stem` (or `rsplit(".", 1)[0]`) consistently in both locations. + +### 2. TMY Year Requirement Not Discoverable from `PVSystem` Workflow β€” Docs + +**Problem:** TMY data from NSRDB contains timestamps from multiple source years (e.g. 2003, 2005, 2010, 2016). Submitting this as a TSV file to the API returns HTTP 400 with `{"detail": "Something went wrong."}` β€” no indication that timestamps are the issue. + +**Existing documentation:** The `calculation_year` docstring in `EnergyCalculationOptions` correctly warns about TMY mixed-year data and the need to remap timestamps. However, `PVSystem` users never interact with `EnergyCalculationOptions` directly, and the `calculation_year` docs state that for TSV format the value is "Ignored β€” timestamps in the file are used as-is", which could be read as implying raw TMY timestamps are acceptable. The agent found `weather.py` and `PVSystem` docs but had no reason to look at `EnergyCalculationOptions`. + +**Cost:** ~40% of debugging time. Required ~15 rounds of payload inspection, monkey-patching the response handler, and trial-and-error before discovering the root cause. + +**Fix:** + +- Add a note to the `PVSystem.weather_file` property docstring warning that TMY data with mixed source years must be remapped to a single calendar year. Reference `EnergyCalculationOptions.calculation_year` for details. +- Add a brief note to the `weather.py` module docstring (not in `TSV_COLUMNS`, which is a format spec β€” the year issue is TMY-specific, not a general format constraint). + +### 3. Weather Format Conversion β€” Docs + +**Problem:** Converting pvlib DataFrames to SolarFarmer TSV format requires knowing: column name mapping, datetime format (`YYYY-MM-DDThh:mm+OO:OO`), tab delimiter, year normalization, and pressure units. This information is spread across `weather.py` and internal source code. + +**Cost:** ~20% of debugging time. + +**Fix:** + +- Add a pvlib-to-SolarFarmer column mapping reference to the `weather.py` module docstring (e.g. `temp_air` β†’ `TAmb`, `wind_speed` β†’ `WS`, `pressure` β†’ `Pressure`). +- Add a minimal conversion code example in `weather.py` or the `PVSystem` class docstring. + +### 4. `MountingType` Import Path β€” Docs + +**Problem:** `sf.MountingType` raises `AttributeError`. Must import from `solarfarmer.models.pvsystem.pvsystem`. + +**Cost:** ~5%. + +**Fix:** Document the import path in the `PVSystem` class docstring, since `mounting` is a required parameter. + +### 5. PAN/OND Dict Key Semantics β€” Docs + +**Problem:** `PVSystem.pan_files` accepts `dict[str, Path]`, but the dict keys are ignored β€” spec IDs are derived from filenames via `filename.split(".")[0]`. The dict interface implies the keys are meaningful. + +**Cost:** ~5%. + +**Fix:** Document in the `pan_files` and `ond_files` property docstrings that keys are not used as spec IDs. + +### 6. `CalculationResults` Usage β€” Docs + +**Problem:** No `results.net_energy_MWh` property. Must use `results.get_performance()['net_energy']`. + +**Cost:** ~5%. + +**Fix:** Add a usage example in the `CalculationResults` class docstring showing `get_performance()` and available keys. + +--- + +## Future Improvements + +- **Server-side validation:** Return field-level error details instead of generic `{"detail": "Something went wrong."}` for 400 responses. This is the single highest-impact improvement but is an API-side change. +- **Client-side TMY validation:** Detect non-contiguous years in TSV weather files before upload and raise a clear `ValueError` with remediation guidance. +- **Weather conversion utility:** `sf.weather.from_pvlib(df)` or `sf.weather.from_dataframe(df)` to handle column renaming, timestamp formatting, and year normalization for TMY data. +- **Top-level enum exports:** Export `MountingType`, `InverterType`, `OrientationType` from `solarfarmer.__init__`. +- **`pan_files` / `ond_files` as list:** Accept `list[Path]` in addition to `dict[str, Path]`, since the keys are unused. +- **Convenience properties on `CalculationResults`:** `net_energy_MWh`, `performance_ratio`, `energy_yield_kWh_per_kWp`. + +--- + +## Summary of Estimated Debugging Cost + +| Issue | Relative Cost | Category | +|---|---|---| +| TMY year requirement not discoverable | ~40% | Docs | +| PAN filename `split(".")` vs `Path.stem` | ~25% | Bug fix | +| Weather format conversion | ~20% | Docs | +| Missing enum import path | ~5% | Docs | +| Dict key confusion | ~5% | Docs | +| No convenience accessors | ~5% | Docs | diff --git a/sdk_weather_file_feedback.md b/sdk_weather_file_feedback.md new file mode 100644 index 0000000..3e17304 --- /dev/null +++ b/sdk_weather_file_feedback.md @@ -0,0 +1,352 @@ +# SolarFarmer Python SDK β€” Weather File Discoverability Feedback + +## Context + +This document captures observations from building a batch energy-yield runner +on top of `dnv-solarfarmer>=0.2.0rc1`. The task involved constructing +`EnergyCalculationInputs` payloads for 14 project variants and sourcing +NSRDB PSM4 TMY weather data formatted for the SolarFarmer API. + +An AI coding agent (Claude) drove the implementation. The agent's workflow β€” +calling `help()`, reading docstrings, inspecting signatures β€” mirrors how +human developers explore an unfamiliar SDK, and exposes where the discoverable +documentation runs short. The recommendations below target both audiences; +agents (especially lower-capability ones) are more sensitive to these gaps +because they cannot fall back on domain intuition the way a human might. + +--- + +## 1 What the SDK tells you about weather files today + +The table below is the **complete set** of weather-file guidance reachable +from `help()` / docstrings without leaving the REPL: + +| Symbol | Documentation | +|--------|---------------| +| `MeteoFileFormat` enum | `"Meteorological file format."` β€” four values (`dat`, `tsv`, `PvSystStandardFormat`, `ProtobufGz`), no per-value descriptions | +| `EnergyCalculationInputsWithFiles.meteo_file_path_or_contents` | `"Meteorological file path or inline contents"` | +| `EnergyCalculationInputsWithFiles.meteo_file_format` | `"Format of the meteorological file"` | +| `run_energy_calculation(meteorological_data_file_path=...)` | `"Accepts Meteonorm .dat, TSV, a SolarFarmer desktop export (e.g., MeteorologicalConditionsDatasetDto_Protobuf.gz), or PVsyst standard format .csv files"` | +| `parse_files_from_paths()` | `"Accepted extensions are .tsv, .dat, .csv (PVsyst format), and .gz (protobuf transfer file)"` | +| `MissingMetDataMethod` enum | `FAIL_ON_VALIDATION`, `REMOVE_TIMESTAMP` β€” no documentation of what constitutes "missing" data | + +None of these locations specify column names, column ordering, expected units, +timestamp format, timezone convention, or missing-data sentinel values. + +--- + +## 2 What a developer actually needs to produce a valid weather file + +To write the TSV conversion function (`fetch_tmy._to_sf_tsv()`), the +following had to be sourced from the **web-hosted user guide** β€” not from the +SDK: + +| Requirement | What we hard-coded | SDK source | +|-------------|-------------------|------------| +| Column names | `GHI`, `DHI`, `TAmb`, `WS`, `Pressure`, `PW`, `RH` | None | +| Column header aliases | API also accepts e.g. `Glob`/`SolarTotal` for GHI, `diff`/`diffuse`/`DIF` for DHI, `Temp`/`T` for temperature, `Wind`/`WindSpeed` for wind, `Water` for PW, `AP` for pressure, `Humidity` for RH β€” discovered only by reading API source code (`ReadMetDataFromTSV.cs`) | None | +| Column order matters | Yes β€” required columns first, optional after | None | +| Timestamp format | `YYYY-MM-DDThh:mm+OO:OO` β€” **not** `YYYY-MM-DD HH:MM:SS` (see Β§2.1 below) | None | +| Timezone convention | UTC offset embedded in each timestamp; local standard time with offset | None | +| Units: irradiance | W/mΒ² | None | +| Units: temperature | Β°C | None | +| Units: wind speed | m/s | None | +| Units: pressure | mbar (not Pa, not hPa) | None | +| Units: precipitable water | cm | None | +| Units: relative humidity | % | None | +| Missing-data sentinel | `9999` or `-9999` | None | +| Value ranges | GHI 0–1300, DHI 0–1000, TAmb βˆ’35–60, WS 0–50, Pressure 750–1100, RH 0–100 | None | + +And TSV is only one of the four format families the API accepts. The same +information gap exists for `.dat` (Meteonorm), `.csv` (PVsyst standard +format, NREL SAM, NSRDB TMY3, SolarAnywhere, SolarGIS, Solcast, Vaisala), +and `.gz` (protobuf). The web user guide lists **11 top-level format +variants**; the SDK enum exposes 4 multipart field types that cover them. + +### 2.1 Timestamp format details + +The exact timestamp format required by the TSV parser is: + + YYYY-MM-DDThh:mm+OO:OO + +Key requirements: + +- `T` separator between date and time (not a space) +- No seconds component +- Mandatory UTC offset (e.g. `-06:00`, `+00:00`) β€” `Z` notation is not + accepted, and omitting the offset is not accepted +- The API parses timestamps via `DateTimeOffset.ParseExact` with format + `yyyy-MM-dd'T'HH:mmzzz` + +The line-detection regex in the TSV parser (`ReadMetDataFromTSV.cs`) requires +the `T` separator β€” timestamps without it are not recognized as data lines, +causing the file to appear empty. + +**Example valid timestamps:** +``` +2001-01-01T00:00-06:00 +2001-06-15T12:30+00:00 +2019-12-31T23:00-07:00 +``` + +### 2.2 Accepted column header aliases + +The TSV parser accepts multiple names for each variable. These aliases are +defined in `ReadMetDataFromTSV.cs` but not documented in the SDK: + +| Variable | Accepted headers | +|----------|-----------------| +| GHI | `GHI`, `ghi`, `Glob`, `SolarTotal` | +| DHI | `DHI`, `diff`, `diffuse`, `DIF` | +| Temperature | `Temp`, `T`, `TAmb` | +| Wind speed | `WS`, `Wind`, `Speed`, `WindSpeed` | +| Precipitable water | `Water`, `PW` | +| Air pressure | `Pressure`, `AP`, `pressure` | +| Relative humidity | `Humidity`, `RH`, `humidity` | +| Albedo | `Surface Albedo`, `Albedo`, `albedo` | +| Soiling | `SL`, `Soiling`, `SoilingLoss`, `Slg` | +| Date/time | `Date`, `DateTime`, `Time` | + +Exposing these alias lists in the SDK (e.g. as a dict constant) would let +both humans and agents validate their header names before uploading. + +--- + +## 3 Files that bypass the Pydantic layer + +The weather file is not the only input that passes through the SDK as an +opaque path or byte stream. The full inventory: + +| File type | SDK treatment | Pydantic-modelled? | Notes | +|-----------|---------------|-------------------|-------| +| **Weather** (`.dat`, `.tsv`, `.csv`, `.gz`) | `str` path β†’ opened as `rb` β†’ uploaded as multipart `tmyFile` / `pvSystStandardFormatFile` / `metDataTransferFile` | **No** | No column/unit/format schema in SDK | +| **PAN** (module spec) | `str` path β†’ opened as `rb` β†’ uploaded as multipart `panFiles` | **No** β€” but `PanFileSupplements` models overrides (quality factor, LID, IAM, bifaciality) | PAN format is a PVsyst standard; modelling it is out of scope | +| **OND** (inverter spec) | `str` path β†’ opened as `rb` β†’ uploaded as multipart `ondFiles` | **No** β€” but `OndFileSupplements` models overrides (over-power mode, voltage derate) | OND format is a PVsyst standard; same reasoning | +| **HOR** (horizon profile) | `str` path β†’ opened as `rb` β†’ uploaded as multipart `horFile` | **Partially** β€” `EnergyCalculationInputs` has `horizon_azimuths` / `horizon_angles` list fields; `HorizonType` enum describes azimuth conventions | Inline numeric arrays are modelled; the file format is not | + +The PAN and OND file formats are owned by PVsyst and widely standardised +across solar modelling tools. Documenting their internal structure in this +SDK would be out of scope β€” the SDK's role is to pass them through to the API +unchanged, and the `*Supplements` models appropriately handle the subset of +parameters the user may want to override. + +The horizon file is partially covered: if you have the data as arrays, you +can bypass the file entirely via the `horizon_azimuths` / `horizon_angles` +fields. The `HorizonType` enum documents the azimuth conventions clearly. + +**The weather file is the outlier** β€” not because it's the only file that +bypasses Pydantic, but because: + +1. It's the only one whose format is **defined by SolarFarmer itself** (the + TSV variant in particular). +2. It's the only one where the **user commonly needs to generate or convert** + the file rather than use an existing one from a data provider. +3. Its format details (columns, units, timestamp handling) are already + documented in the web user guide β€” they just aren't surfaced in the SDK. + +--- + +## 4 Recommendations + +### 4.1 Enrich the `MeteoFileFormat` enum (minimal cost, immediate impact) + +Add per-value docstrings and a link to the full specification: + +```python +class MeteoFileFormat(str, Enum): + """Meteorological file format. + + The API accepts weather data in several formats. The SDK maps file + extensions to the correct multipart upload field automatically: + + - ``.dat`` and ``.tsv`` β†’ ``tmyFile`` + - ``.csv`` (PVsyst standard) β†’ ``pvSystStandardFormatFile`` + - ``.gz`` (protobuf) β†’ ``metDataTransferFile`` + + For TSV format details (column names, units, timestamp convention), + see the user guide: + https://mysoftware.dnv.com/.../DefineClimate/SolarResources.html + """ + + DAT = "dat" + """Meteonorm PVsyst Hourly TMY format.""" + TSV = "tsv" + """SolarFarmer tab-separated values β€” see class docstring for column spec.""" + PVSYST_STANDARD_FORMAT = "PvSystStandardFormat" + """PVsyst standard CSV export format.""" + PROTOBUF_GZ = "ProtobufGz" + """SolarFarmer desktop binary transfer format (protobuf, gzip-compressed).""" +``` + +This is a documentation-only change; no new code, no API surface change. + +### 4.2 Add a TSV format specification as a module-level docstring or constants + +The TSV format is SolarFarmer's own. A module (e.g. +`solarfarmer.models.weather` or constants in `solarfarmer.config`) could +expose the spec as discoverable Python objects: + +```python +# solarfarmer/models/weather.py + +TSV_COLUMNS = { + "required": [ + {"name": "DateTime", "unit": None, "format": "YYYY-MM-DDThh:mm+OO:OO", "note": "ISO 8601 with mandatory UTC offset, no seconds, T separator"}, + {"name": "GHI", "unit": "W/mΒ²", "range": (0, 1300), "aliases": ["ghi", "Glob", "SolarTotal"]}, + {"name": "TAmb", "unit": "Β°C", "range": (-35, 60), "aliases": ["Temp", "T"]}, + ], + "optional": [ + {"name": "DHI", "unit": "W/mΒ²", "range": (0, 1000), "aliases": ["diff", "diffuse", "DIF"], "note": "omit only if unavailable; engine can decompose from GHI (calculateDHI=True) but measured DHI is strongly preferred for accuracy"}, + {"name": "WS", "unit": "m/s", "range": (0, 50), "aliases": ["Wind", "Speed", "WindSpeed"]}, + {"name": "Pressure", "unit": "mbar", "range": (750, 1100), "aliases": ["AP", "pressure"]}, + {"name": "PW", "unit": "cm", "range": (0, 100), "aliases": ["Water"]}, + {"name": "RH", "unit": "%", "range": (0, 100), "aliases": ["Humidity", "humidity"]}, + {"name": "Albedo", "unit": "β€”", "range": (0, 1), "aliases": ["Surface Albedo", "albedo"]}, + {"name": "Soiling", "unit": "β€”", "range": (0, 1), "aliases": ["SL", "SoilingLoss", "Slg"]}, + ], + "missing_value_sentinel": 9999, + "delimiter": "\t", +} +``` + +This doubles as documentation and as a machine-readable contract that tools +(and AI agents) can consume programmatically. No Pydantic model overhead, no +validation logic β€” just a discoverable data structure. + +### 4.3 Cross-reference weather-dependent options + +Several fields in `EnergyCalculationOptions` have implicit dependencies on +the weather file contents, but nothing connects them: + +| Option | Dependency | Current documentation | +|--------|-----------|----------------------| +| `calculate_dhi` | If `True`, DHI column is not required in the weather file β€” but providing measured DHI is **strongly preferred** for accuracy; the engine's GHIβ†’DHI decomposition is a fallback, not a replacement for source data | `"Whether to calculate DHI from GHI"` β€” no mention of weather file or accuracy trade-off | +| `missing_met_data_handling` | Controls behaviour when weather data contains sentinel values (`9999`) | `"Behaviour when required meteorological variables have missing data"` β€” no mention of what "missing" means in the file | +| `use_albedo_from_met_data_when_available` | Reads `Albedo` column from weather file if present | Field exists but no cross-reference to weather column name | +| `use_soiling_from_met_data_when_available` | Reads `Soiling` column from weather file if present | Same | +| `calculation_year` | **Only** processes weather file rows whose timestamp year matches this value (default 1990). TMY files with mixed years (NSRDB, PVGIS, etc.) will have most rows silently discarded. | `"Year to use for the calculation"` β€” no mention of the filtering behaviour or its interaction with TMY data | + +Adding a one-line cross-reference in each field's docstring (e.g. +*"When True, the DHI column may be omitted from the weather file"*) would +close the gap without adding code. + +The `calculation_year` / TMY interaction is particularly dangerous because +it fails silently β€” the API returns results (not an error), but those +results are based on a fraction of the intended data. See +`sdk_general_feedback.md` Β§10 for a detailed description of the problem +and suggestions. + +### 4.4 `write_tsv()` with a companion pvlib column-name mapping (medium effort, high value) + +pvlib's `map_variables=True` convention β€” supported in **30+ reader +functions** β€” normalizes every data source to a standard set of column names +(`ghi`, `dhi`, `temp_air`, `wind_speed`, `pressure`, `precipitable_water`, +`relative_humidity`, `albedo`, …). Any DataFrame from pvlib has these names +regardless of whether the source was NSRDB, Solcast, SolarGIS, ERA5, PVGIS, +SolarAnywhere, or a local TMY3/EPW file. + +The SDK could ship a writer function alongside a provided mapping constant: + +```python +# Provided as a convenience constant (not a default β€” user data may have +# arbitrary column names). Covers the pvlib `map_variables=True` convention. +PVLIB_TO_SF_TSV = { + "ghi": "GHI", + "dhi": "DHI", + "temp_air": "TAmb", + "wind_speed": "WS", + "pressure": "Pressure", # pvlib: mbar; SF: mbar βœ“ + "precipitable_water": "PW", + "relative_humidity": "RH", + "albedo": "Albedo", +} + +def write_tsv( + df: pd.DataFrame, + path: str | Path, + *, + column_map: dict[str, str], + missing_value: float = 9999, +) -> None: + """Write a SolarFarmer-compatible TSV weather file. + + Parameters + ---------- + df : DataFrame + Must have a DatetimeIndex. Column names are mapped using + ``column_map``; unrecognised columns are dropped. + path : str or Path + Output file path (.tsv). + column_map : dict + Mapping from DataFrame column names to SolarFarmer TSV column + names. Keys are the source column names; values must be valid + TSV headers (see ``TSV_COLUMNS``). + missing_value : float + Sentinel value for NaN. Default is 9999. + + See Also + -------- + PVLIB_TO_SF_TSV : Pre-built mapping for DataFrames from any pvlib + reader called with ``map_variables=True``. + + Examples + -------- + >>> import pvlib.iotools + >>> from solarfarmer.weather import write_tsv, PVLIB_TO_SF_TSV + >>> df, meta = pvlib.iotools.get_nsrdb_psm4_tmy(lat, lon, api_key, email) + >>> write_tsv(df, "site.tsv", column_map=PVLIB_TO_SF_TSV) + """ +``` + +Making `column_map` a required argument (with no default) avoids the +assumption that user data will always match pvlib conventions β€” many datasets +arrive from non-pvlib sources with arbitrary column names. The +`PVLIB_TO_SF_TSV` constant is there for users who *do* work in the pvlib +ecosystem, and the `See Also` / `Examples` in the docstring make it +discoverable for both humans and agents without hard-wiring it into the +function signature. + +The `PVLIB_TO_SF_TSV` dict also documents the relationship between the two +ecosystems in a machine-readable way β€” an AI agent inspecting the constant +immediately understands both the pvlib input names and the SolarFarmer output +names. + +### 4.5 Example data: documentation snippets over bundled files + +Bundling full 8760-row weather files would expand the 864 KB package by +several MB β€” a ~10Γ— size increase for data that is used only at documentation +time. Alternatives that provide the same discoverability without bloating the +distribution: + +1. **Inline docstring examples** showing 3–5 rows of a valid TSV: + + ``` + DateTime GHI DHI TAmb WS + 2001-01-01T00:00-06:00 0.0 0.0 -5.20 3.40 + 2001-01-01T01:00-06:00 0.0 0.0 -5.80 2.90 + 2001-01-01T06:00-06:00 12.3 8.1 -4.10 3.10 + ``` + +2. **Links to a companion `solarfarmer-examples` repository or + documentation site** for full-size files. + +Option 1 is nearly free and solves the AI-agent use case directly (the agent +reads docstrings, sees the format, and can produce a conformant file without +any external fetch). + +--- + +## 5 Summary of priorities + +| # | Change | Effort | Impact | Scope | +|---|--------|--------|--------|-------| +| 1 | Enrich `MeteoFileFormat` enum docstrings + doc link | Trivial | High for discoverability | Docs only | +| 2 | TSV column spec as module-level constants | Small | High for both humans and agents | New module, no API change | +| 3 | Cross-reference weather-dependent fields in `EnergyCalculationOptions` β€” especially `calculationYear` + TMY | Trivial | **High** β€” `calculationYear` silently discards non-matching years | Docstring edits | +| 4 | `write_tsv()` helper + `PVLIB_TO_SF_TSV` mapping constant | Medium | High β€” eliminates the #1 user-written boilerplate; mapping constant bridges pvlib ecosystem | New public function + constant | +| 5 | Inline docstring example (3–5 TSV rows) | Trivial | High for agents, good for humans | Docs only | + +Recommendations 1, 3, and 5 are pure documentation changes that could ship in +a patch release. Recommendation 2 provides the machine-readable contract that +makes the SDK self-describing for weather data. Recommendation 4 is the +functional complement to the existing file-reading utilities. diff --git a/solarfarmer/weather.py b/solarfarmer/weather.py index d8d8497..3726ae5 100644 --- a/solarfarmer/weather.py +++ b/solarfarmer/weather.py @@ -141,7 +141,7 @@ def from_dataframe( ) -> pathlib.Path: """Write a DataFrame to a SolarFarmer TSV weather file. - .. note:: Requires ``pandas``. Install with ``pip install 'dnv-solarfarmer[weather]'``. + .. note:: Requires ``pandas``. Install with ``pip install 'dnv-solarfarmer[all]'``. Parameters ---------- @@ -209,7 +209,7 @@ def from_pvlib( Wrapper around :func:`from_dataframe` with the standard pvlib column mapping and Pa β†’ mbar pressure conversion. - .. note:: Requires ``pandas``. Install with ``pip install 'dnv-solarfarmer[weather]'``. + .. note:: Requires ``pandas``. Install with ``pip install 'dnv-solarfarmer[all]'``. Parameters ----------