Skip to content

Commit 5ea35ec

Browse files
Merge branch 'main' of https://github.com/NHSDigital/data-validation-engine into release_v09
2 parents 4b687dc + 150d3fa commit 5ea35ec

28 files changed

Lines changed: 960 additions & 551 deletions

File tree

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,20 @@
1+
## v0.8.1 (2026-06-17)
2+
3+
### Build
4+
- Update lxml from v4.6.4 to 6.1.1
5+
- Update pyarrow from 17.0.0 to 23.0.1
6+
7+
## v0.8.0 (2026-06-10)
8+
9+
### Feat
10+
11+
- add additional fields check into csv readers (#109)
12+
13+
### Fix
14+
15+
- add greater error handling around polars and duckdb csv reader (#112)
16+
- adjust csv header check feedback message to be more detailed
17+
118
## v0.7.6 (2026-04-30)
219

320
### Fix

README.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44
</h1>
55

66
![License](https://img.shields.io/github/license/NHSDigital/data-validation-engine)
7-
![Version](https://img.shields.io/github/v/release/NHSDigital/data-validation-engine)
7+
![PyPi](https://img.shields.io/pypi/v/data-validation-engine)
8+
![Conda](https://anaconda.org/nhs/data-validation-engine/badges/version.svg)
89
[![CI Unit Tests](https://github.com/NHSDigital/data-validation-engine/actions/workflows/ci_testing.yml/badge.svg)](https://github.com/NHSDigital/data-validation-engine/actions/workflows/ci_testing.yml)
910
[![CI Formatting & Linting](https://github.com/NHSDigital/data-validation-engine/actions/workflows/ci_linting.yml/badge.svg)](https://github.com/NHSDigital/data-validation-engine/actions/workflows/ci_linting.yml)
1011

@@ -60,13 +61,16 @@ Below is a list of features that we would like to implement or have been request
6061
| ------------------------------------------------------------------------------- | ----------------- | --------- |
6162
| Open source release | 0.1.0 | Yes |
6263
| Uplift to Python 3.11 | 0.2.0 | Yes |
63-
| Uplift Pyspark to 3.5 | TBA | No |
64-
| Allow DVE to run on Python 3.12+ | TBA | No |
65-
| Upgrade to Pydantic 2.0 | TBA | No |
64+
| Uplift Pyspark to 3.5 | 0.8.0 | Yes |
65+
| Allow DVE to run on Python 3.12+ | 0.8.0 | Yes |
66+
| Upgrade to Pydantic 2.0 | 0.9.0 | No |
6667
| Uplift Pyspark to 4.0+ | TBA | No |
67-
| Create a more user friendly interface for building and modifying dischema files | Not yet confirmed | No |
68+
| Polars upgrade to v1+ | TBA | No |
69+
| DuckDB upgrade to v1.5+ | TBA | No |
70+
| Python 3.13 & 3.14 upgrade | TBA | No |
71+
| Create a more user friendly interface for building and modifying dischema files | TBA | No |
6872

69-
Beyond the Python and Pydantic upgrade, we cannot confirm the other features will be made available anytime soon. Therefore, if you have the interest and desire to make these features available, then please read the [Contributing](#Contributing) section and get involved.
73+
If you are interested in getting any of the unreleased features listed above available, then please read the [Contributing](#Contributing) section and then submit us a pull request.
7074

7175
## Contributing
7276
Please see guidance [here](https://github.com/NHSDigital/data-validation-engine/blob/main/CONTRIBUTE.md).

docs/user_guidance/data_contract.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ tags:
66
- Domain Types
77
---
88

9-
The Data Contract defines the structure (models) of your data and controls how it is typecast. We use [Pydantic](https://docs.pydantic.dev/1.10/) to generate and validate the models. This page is meant to give you greater details on how you should write your Data Contract. If you want a summary of how the Data Contract works, please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
9+
The Data Contract defines the structure (models) of your data and controls how it is typecast. We use [Pydantic](https://pydantic.dev/docs/validation/1.10/overview/) to generate and validate the models. This page is meant to give you greater details on how you should write your Data Contract. If you want a summary of how the Data Contract works, please refer to the [Getting Started](./getting_started.md#rules-configuration-introduction) page.
1010

1111
!!! Note
1212

13-
We plan to migrate to Pydantic v2+ in a future release. This page currently reflects what is available through Pydantic v1.
13+
We plan to migrate to Pydantic v2+ in v0.9.0. This page currently reflects what is available through Pydantic v1.
1414

1515
## Models
1616

@@ -206,7 +206,7 @@ If you want to read more about the readers, please see the [File Transformation]
206206

207207
Within the `fields` section of the contract you must define what data type a given field should be. Depending on how strict/lenient you want your types to be, a number of types are available to use. The types available are:
208208

209-
- [Built-in standard library](https://docs.python.org/3.11/library/stdtypes.html) types (such as `int`, `str`, `date`) available with your version of Python installed for the DVE.
209+
- [Built-in standard library](https://docs.python.org/3.12/library/stdtypes.html) types (such as `int`, `str`, `date`) available with your version of Python installed for the DVE.
210210
- [Pydantic v1 types](https://docs.pydantic.dev/1.10/usage/types/)
211211
- [Custom Types](./data_contract.md#custom-types)
212212
- [Domain types](./data_contract.md#domain-types)

docs/user_guidance/implementations/spark.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def get_spark_session() -> SparkSession:
2525
os.environ["PYSPARK_SUBMIT_ARGS"] = " ".join(
2626
[
2727
"--packages",
28-
"com.databricks:spark-xml_2.12:0.16.0,io.delta:delta-core_2.12:2.4.0",
28+
"com.databricks:spark-xml_2.12:0.16.0,io.delta:delta-spark_2.12:3.2.0",
2929
"pyspark-shell",
3030
]
3131
)

docs/user_guidance/install.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ tags:
88
!!! warning
99
**DVE is currently an unstable package. Expect breaking changes between every minor patch**. We intend to follow semantic versioning of `major.minor.patch` more strictly after a 1.0 release. Until then, we recommend that you pin your install to the latest version available and keep an eye on [future releases](https://github.com/NHSDigital/data-validation-engine/releases).
1010

11-
**Please note that we only support Python runtimes of 3.10 and 3.11.** In the future we will look to add support for Python versions greater than 3.11, but it's not an immediate priority.
11+
**Please note that we only support Python runtimes of 3.10, 3.11 & 3.12.** In the future we will look to add support for Python versions greater than 3.12, but it's not an immediate priority.
1212

1313
If working on Python 3.7, the `0.1` release supports this (and only this) version of Python. However, we have not been updating that version with any bugfixes, performance improvements etc. There are also a number of vulnerable dependencies on version `0.1` release due to [Python 3.7 being depreciated](https://devguide.python.org/versions/) and a number of packages dropping support. **If you choose to install `0.1`, you accept the risks of doing so and additional support will not be provided.**
1414

@@ -71,8 +71,11 @@ You can install the DVE package through python package managers such as [pip](ht
7171
poetry install
7272
```
7373

74-
!!! info
75-
We are working on getting the DVE available via Conda. We will update this page with the relevant instructions once this has been successfully setup.
74+
=== "conda"
75+
76+
```sh
77+
conda install nhs::data-validation-engine
78+
```
7679

7780
Python dependencies are listed in the [`pyproject.toml`](https://github.com/NHSDigital/data-validation-engine/blob/main/pyproject.toml). Many of the dependencies are locked to quite restrictive versions due to complexity of this package. Core packages such as Pydantic, Pyspark and DuckDB are unlikely to receive flexible version constraints as changes in those packages could cause the DVE to malfunction. For less important dependencies, we have tried to make the contraints more flexible. Therefore, we would advise you to install the DVE into a seperate environment rather than trying to integrate it into an existing Python environment.
7881

@@ -83,6 +86,7 @@ Once you have installed the DVE you are almost ready to use it. To be able to ru
8386

8487
| DVE Version | Python Version | DuckDB Version | Spark Version | Pydantic Version |
8588
| ------------ | -------------- | -------------- | ------------- | ---------------- |
89+
| >=0.8.0 | >=3.10,<3.13 | 1.1.3 | 3.5.2 | 1.10.19 |
8690
| >=0.7.2 | >=3.10,<3.12 | 1.1.* | 3.4.* | 1.10.16 |
8791
| >=0.6 | >=3.10,<3.12 | 1.1.* | 3.4.* | 1.10.15 |
8892
| >=0.2,<0.6 | >=3.10,<3.12 | 1.1.0 | 3.4.4 | 1.10.15 |

0 commit comments

Comments
 (0)