Skip to content
Draft
19 changes: 19 additions & 0 deletions doc/syntax/clause/draw.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,25 @@ The `SETTING` clause can be used for two different things:
#### Position
A special setting is `position` which controls how overlapping objects are repositioned to avoid overlapping etc. Position adjustments have special mapping requirements so all position adjustments will not be relevant for all layer types. Different layers have different defaults as detailed in their documentation. You can read about each different position adjustment at [their own documentation sites](../index.qmd#position-adjustments).

#### Aggregate
Some layers support aggregation of its data through the `aggregate` setting. These layers will state this. `aggregate` allows a single string or an array of strings that specify the aggregation to calculate. The aggregates can be either a simple function or a parameterized band function.

The simple functions can be one of:

* `'count'`: Row count
* `'sum'` and `'prod'`: The sum or product
* `'min'`, `'max'`, and `'range'`: Extremes and max - min
* `'mean'`, and `'median'`: Central tendency
* `'geomean'`, `'harmean'`, and `'rms'`: Geometric, harmonic, and root-mean-square
* `'sdev'`, `'var'`, `'iqr'`, and `'se'`: Standard deviation, variance, interquartile range, and standard error
* `'p05'`, `'p10'`, `'p25'`, `'p50'`, `'p75'`, `'p90'`, and `'p95'`: Percentiles

For band functions you combine an offset with an expansion, potentially multiplied. An example could be `'mean-1.96sdev'` which does exactly what you'd expect it to be. The general form is `<offset>±<multiplier><expansion>` with `<multiplier>` being optional (defaults to `1`).

Allowed offsets are: `'mean'`, `'median'`, `'geomean'`, `'harmean'`, `'rms'`, `'sum'`, `'prod'`, `'min'`, `'max'`, and `'p05'`–`'p95'`

Allowed expansions are: `'sdev'`, `'se'`, `'var'`, `'iqr'`, and `'range'`

### `FILTER`
```ggsql
FILTER <condition>
Expand Down
5 changes: 4 additions & 1 deletion doc/syntax/layer/type/area.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,12 @@ The following aesthetics are recognised by the area layer.
* `orientation`: The orientation of the layer, see the [Orientation section](#orientation). One of the following:
* `'aligned'` to align the layer's primary axis with the coordinate system's first axis.
* `'transposed'` to align the layer's primary axis with the coordinate system's second axis.
* `aggregate`: Aggregation functions to apply per group. Either a single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The area layer sorts the data along its primary axis
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY`, all discrete mappings, and the primary axis, aggregates will be calculated and used as the values to plot. Multiple aggregates will give rise to multiple separate groups in the end. These can be distinguished through the added `aggregate` column you can remap to, e.g. `REMAPPING aggregate AS color`

Further, the area layer sorts the data along its primary axis before returning it.

## Orientation
Area plots are sorted and connected along their primary axis. Since the primary axis cannot be deduced from the mapping it must be specified using the `orientation` setting. E.g. if you wish to create a vertical area plot you need to set `orientation => 'transposed'` to indicate that the primary layer axis follows the second axis of the coordinate system.
Expand Down
13 changes: 13 additions & 0 deletions doc/syntax/layer/type/bar.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,13 @@ The bar layer has no required aesthetics
## Settings
* `position`: Position adjustment. One of `'identity'`, `'stack'` (default), `'dodge'`, or `'jitter'`
* `width`: The width of the bars as a proportion of the available width (0 to 1)
* `aggregate`: Aggregation functions to apply per group if the secondary position has been mapped. Either a single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
If the secondary axis has not been mapped the layer will calculate counts for you and display these as the secondary axis.

If the secondary axis has been mapped you can apply aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY`, all discrete mappings, and the primary axis, aggregates will be calculated and used as the values to plot. Multiple aggregates will give rise to multiple separate groups in the end. These can be distinguished through the added `aggregate` column you can remap to, e.g. `REMAPPING aggregate AS color`

### Properties

* `weight`: If mapped, the sum of the weights within each group is calculated instead of the count in each group
Expand Down Expand Up @@ -116,3 +119,13 @@ DRAW bar
MAPPING species AS fill
PROJECT TO polar
```

Use a different type of aggregation for the bars through the `aggregate` setting:

```{ggsql}
VISUALISE species AS y, body_mass AS y FROM ggsql:penguins
DRAW bar
SETTING aggregate => 'mean', fill => 'steelblue'
DRAW range
setting aggregate => ('mean-1.96sdev', 'mean+1.96sdev')
```
18 changes: 16 additions & 2 deletions doc/syntax/layer/type/line.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,16 @@ The following aesthetics are recognised by the line layer.
* `orientation`: The orientation of the layer, see the [Orientation section](#orientation). One of the following:
* `'aligned'` to align the layer's primary axis with the coordinate system's first axis.
* `'transposed'` to align the layer's primary axis with the coordinate system's second axis.
* `aggregate`: Aggregation functions to apply per group. Either a single string or an array of strings. See an overview of aggregation function in [the `DRAW` documentation](../../clause/draw.qmd#aggregate) and more information in the *Data transformation* section below.

## Data transformation
The line layer sorts the data along its primary axis.
This layer supports aggregation through the `aggregate` setting. Within each group, defined by `PARTITION BY`, all discrete mappings, and the primary axis, aggregates will be calculated and used as the values to plot. Multiple aggregates will give rise to multiple separate groups in the end. These can be distinguished through the added `aggregate` column you can remap to, e.g. `REMAPPING aggregate AS color`

Further, the line layer sorts the data along its primary axis before returning it.

If the line has a variable `stroke` or `opacity` aesthetic within groups, the line is broken into segments.
Each segment gets the property of the preceding datapoint, so the last datapoint in a group does not transfer these properties.
This behavior is not compatible with aggregation.

## Orientation
Line plots are sorted and connected along their primary axis. Since the primary axis cannot be deduced from the mapping it must be specified using the `orientation` setting. If you wish to create a vertical line plot, you need to set `orientation => 'transposed'` to indicate that the primary layer axis follows the second axis of the coordinate system.
Expand Down Expand Up @@ -89,4 +94,13 @@ VISUALISE x, y FROM data
DRAW line
MAPPING z AS linewidth
SCALE linewidth TO (0, 30)
```
```

Use aggregation to draw min and max lines from a set of observations

```{ggsql}
VISUALISE Day AS x, Temp AS y FROM ggsql:airquality
DRAW line
SETTING aggregate => ('min', 'max')
DRAW point
```
58 changes: 56 additions & 2 deletions src/execute/layer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -187,11 +187,16 @@ pub fn apply_remappings_post_query(df: DataFrame, layer: &Layer) -> Result<DataF
}
}

// Drop any remaining __ggsql_stat_* columns that weren't consumed by remappings.
// Drop any remaining __ggsql_stat_* columns that weren't consumed by
// remappings — except those promoted to `partition_by` post-stat (e.g. the
// Aggregate stat's `__ggsql_stat_aggregate` column when the user didn't
// remap it; downstream renderers need it as a grouping key).
let stat_cols: Vec<String> = df
.get_column_names()
.into_iter()
.filter(|name| naming::is_stat_column(name))
.filter(|name| {
naming::is_stat_column(name) && !layer.partition_by.contains(&name.to_string())
})
.collect();
if !stat_cols.is_empty() {
df = df.drop_many(&stat_cols)?;
Expand All @@ -200,6 +205,18 @@ pub fn apply_remappings_post_query(df: DataFrame, layer: &Layer) -> Result<DataF
Ok(df)
}

/// Count the number of aggregate functions requested in the `aggregate` SETTING.
/// Used to gate auto-promotion of the `aggregate` stat column to `partition_by`:
/// a single function produces a constant column, and partitioning by it adds a
/// useless detail channel.
fn aggregate_param_function_count(parameters: &HashMap<String, ParameterValue>) -> usize {
match parameters.get("aggregate") {
Some(ParameterValue::String(_)) => 1,
Some(ParameterValue::Array(arr)) => arr.len(),
_ => 0,
}
}

/// Convert a literal value to an Arrow ArrayRef with constant values.
///
/// For string literals, attempts to parse as temporal types (date/datetime/time)
Expand Down Expand Up @@ -584,6 +601,19 @@ where
layer.mappings.aesthetics.remove(aes);
}

// Auto-remap stat columns whose names are position aesthetics that were
// consumed by the stat (e.g. Aggregate's `pos1`/`pos2` outputs). The geom
// can't list these in `default_remappings` because the set of position
// aesthetics in play is dynamic per layer.
for stat in &stat_columns {
if final_remappings.contains_key(stat) {
continue;
}
if aesthetic::is_position_aesthetic(stat) && consumed_aesthetics.contains(stat) {
final_remappings.insert(stat.clone(), stat.clone());
}
}

// Apply stat_columns to layer aesthetics using the remappings
for stat in &stat_columns {
if let Some(aesthetic) = final_remappings.get(stat) {
Expand Down Expand Up @@ -621,6 +651,30 @@ where
}
}

// The `aggregate` stat column (produced by stat_aggregate when the
// user requests multiple functions) tags each row with its function
// name. For mark types that connect rows within a group (line, area,
// path, polygon), we need to add this column to `layer.partition_by`
// so that e.g. `aggregate => ('min', 'max')` renders as two separate
// lines rather than one zigzag through both. Resolves to the
// post-rename data-column name: if the user remapped `aggregate AS
// <aes>`, the prefixed aesthetic column; otherwise the stat column.
//
// Only fires when more than one function is requested — a single
// function produces a constant aggregate column, partitioning by
// which would just add a no-op detail channel.
if stat_columns.iter().any(|s| s == "aggregate")
&& aggregate_param_function_count(&layer.parameters) > 1
{
let partition_col = match final_remappings.get("aggregate") {
Some(aes) => naming::aesthetic_column(aes),
None => naming::stat_column("aggregate"),
};
if !layer.partition_by.contains(&partition_col) {
layer.partition_by.push(partition_col);
}
}

// Wrap transformed query to rename stat columns to prefixed aesthetic names
let stat_rename_exprs: Vec<String> = stat_columns
.iter()
Expand Down
22 changes: 18 additions & 4 deletions src/execute/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -116,24 +116,38 @@ fn validate(
}
}

// Validate remapping source columns are valid stat columns for this geom
// Validate remapping source columns are valid stat columns for this geom.
// Geoms that opt into the Aggregate stat (`supports_aggregate`) also accept
// `aggregate`, `count`, and any position aesthetic name as a stat source.
let valid_stat_columns = layer.geom.valid_stat_columns();
let supports_aggregate = layer.geom.supports_aggregate();
for stat_value in layer.remappings.aesthetics.values() {
if let Some(stat_col) = stat_value.column_name() {
if !valid_stat_columns.contains(&stat_col) {
if valid_stat_columns.is_empty() {
let is_aggregate_stat_col = supports_aggregate
&& (stat_col == "aggregate"
|| stat_col == "count"
|| crate::plot::aesthetic::is_position_aesthetic(stat_col));
if !valid_stat_columns.contains(&stat_col) && !is_aggregate_stat_col {
if valid_stat_columns.is_empty() && !supports_aggregate {
return Err(GgsqlError::ValidationError(format!(
"Layer {}: REMAPPING not supported for geom '{}' (no stat transform)",
idx + 1,
layer.geom
)));
} else {
let mut valid: Vec<String> =
valid_stat_columns.iter().map(|s| s.to_string()).collect();
if supports_aggregate {
valid.push("aggregate".to_string());
valid.push("count".to_string());
}
let valid_refs: Vec<&str> = valid.iter().map(|s| s.as_str()).collect();
return Err(GgsqlError::ValidationError(format!(
"Layer {}: REMAPPING references unknown stat column '{}'. Valid stat columns for geom '{}' are: {}",
idx + 1,
stat_col,
layer.geom,
crate::and_list(valid_stat_columns)
crate::and_list(&valid_refs)
)));
}
}
Expand Down
46 changes: 30 additions & 16 deletions src/plot/layer/geom/area.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
use crate::plot::layer::orientation::{ALIGNED, ORIENTATION_VALUES};
use crate::plot::types::DefaultAestheticValue;
use crate::plot::{DefaultParamValue, ParamDefinition};
use crate::{naming, Mappings};
use crate::Mappings;

use super::types::{ParamConstraint, POSITION_VALUES};
use super::{DefaultAesthetics, GeomTrait, GeomType, StatResult};
use super::stat_aggregate;
use super::types::{wrap_with_order_by, ParamConstraint, POSITION_VALUES};
use super::{has_aggregate_param, DefaultAesthetics, GeomTrait, GeomType, StatResult};

/// Area geom - filled area charts
#[derive(Debug, Clone, Copy)]
Expand Down Expand Up @@ -54,28 +55,41 @@ impl GeomTrait for Area {
PARAMS
}

fn supports_aggregate(&self) -> bool {
true
}

fn needs_stat_transform(&self, _aesthetics: &Mappings) -> bool {
true
}

fn apply_stat_transform(
&self,
query: &str,
_schema: &crate::plot::Schema,
_aesthetics: &Mappings,
_group_by: &[String],
_parameters: &std::collections::HashMap<String, crate::plot::ParameterValue>,
schema: &crate::plot::Schema,
aesthetics: &Mappings,
group_by: &[String],
parameters: &std::collections::HashMap<String, crate::plot::ParameterValue>,
_execute_query: &dyn Fn(&str) -> crate::Result<crate::DataFrame>,
_dialect: &dyn crate::reader::SqlDialect,
dialect: &dyn crate::reader::SqlDialect,
) -> crate::Result<StatResult> {
// Area geom needs ordering by pos1 (domain axis) for proper rendering
let order_col = naming::aesthetic_column("pos1");
Ok(StatResult::Transformed {
query: format!("{} ORDER BY {}", query, naming::quote_ident(&order_col)),
stat_columns: vec![],
dummy_columns: vec![],
consumed_aesthetics: vec![],
})
let result = if has_aggregate_param(parameters) {
stat_aggregate::apply(
query,
schema,
aesthetics,
group_by,
parameters,
dialect,
self.aggregate_slots(),
self.aggregate_range_pair(),
)?
} else {
StatResult::Identity
};
// Area needs ordering by pos1 (domain axis) for proper rendering, in both
// the Identity and Aggregate paths.
Ok(wrap_with_order_by(query, result, "pos1"))
}
}

Expand Down
8 changes: 8 additions & 0 deletions src/plot/layer/geom/arrow.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,14 @@ impl GeomTrait for Arrow {
}];
PARAMS
}

fn supports_aggregate(&self) -> bool {
true
}

fn aggregate_slots(&self) -> &'static [u8] {
&[1, 2]
}
}

impl std::fmt::Display for Arrow {
Expand Down
25 changes: 21 additions & 4 deletions src/plot/layer/geom/bar.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
use std::collections::HashMap;
use std::collections::HashSet;

use super::stat_aggregate;
use super::types::{get_column_name, POSITION_VALUES};
use super::{
DefaultAesthetics, DefaultParamValue, GeomTrait, GeomType, ParamConstraint, ParamDefinition,
StatResult,
has_aggregate_param, DefaultAesthetics, DefaultParamValue, GeomTrait, GeomType,
ParamConstraint, ParamDefinition, StatResult,
};
use crate::naming;
use crate::plot::types::{DefaultAestheticValue, ParameterValue};
Expand Down Expand Up @@ -79,6 +80,10 @@ impl GeomTrait for Bar {
&["pos1", "pos2", "weight"]
}

fn supports_aggregate(&self) -> bool {
true
}

fn needs_stat_transform(&self, _aesthetics: &Mappings) -> bool {
true // Bar stat decides COUNT vs identity based on y mapping
}
Expand All @@ -89,10 +94,22 @@ impl GeomTrait for Bar {
schema: &Schema,
aesthetics: &Mappings,
group_by: &[String],
_parameters: &HashMap<String, ParameterValue>,
parameters: &HashMap<String, ParameterValue>,
_execute_query: &dyn Fn(&str) -> Result<DataFrame>,
_dialect: &dyn SqlDialect,
dialect: &dyn SqlDialect,
) -> Result<StatResult> {
if has_aggregate_param(parameters) {
return stat_aggregate::apply(
query,
schema,
aesthetics,
group_by,
parameters,
dialect,
self.aggregate_slots(),
self.aggregate_range_pair(),
);
}
stat_bar_count(query, schema, aesthetics, group_by)
}
}
Expand Down
Loading
Loading