Skip to content

[SPARK-54986][PYSPARK][DOCS] Document return type (DoubleType) for aggregate functions#55075

Open
hudda-ravina wants to merge 1 commit intoapache:masterfrom
hudda-ravina:Ravina_Hudda_Fix-pyspark
Open

[SPARK-54986][PYSPARK][DOCS] Document return type (DoubleType) for aggregate functions#55075
hudda-ravina wants to merge 1 commit intoapache:masterfrom
hudda-ravina:Ravina_Hudda_Fix-pyspark

Conversation

@hudda-ravina
Copy link
Copy Markdown

What changes were proposed in this pull request?

This PR updates the PySpark API documentation to explicitly mention the return type of aggregate functions such as stddev, stddev_samp, stddev_pop, variance, var_samp, and var_pop.

From inspecting the underlying implementation in CentralMomentAgg.scala, these functions return DoubleType regardless of input column type. However, this was not clearly documented in the PySpark function docstrings.

This PR adds explicit mention of DoubleType in the return section of the relevant functions in builtin.py.


Why are the changes needed?

Currently, the return type of these aggregate functions is not clearly documented, which may confuse users.

Adding this improves documentation clarity and aligns PySpark docs with actual implementation.


Does this PR introduce any user-facing change?

No. This is a documentation-only change.


How was this patch tested?

Documentation change only. Verified consistency with implementation in CentralMomentAgg.scala.

…PySpark

Signed-off-by: Ravina Hudda <huddaravina236@gmail.com>
Signed-off-by: Ravina Hudda <ravina01@infosys.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants