Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
4ca0f19
Bump minio from 7.2.16 to 7.2.18
dependabot[bot] Oct 6, 2025
cfedd6c
Merge pull request #98 from eScienceLab/dependabot/pip/minio-7.2.18
douglowe Oct 14, 2025
0b7e4c3
Bump python-dotenv from 1.1.1 to 1.2.1
dependabot[bot] Oct 27, 2025
e1ea2a2
Bump apiflask from 2.4.0 to 3.0.2
dependabot[bot] Nov 24, 2025
011279e
Merge pull request #100 from eScienceLab/dependabot/pip/python-dotenv…
douglowe Dec 1, 2025
d0df866
Merge branch 'develop' into dependabot/pip/apiflask-3.0.2
douglowe Dec 1, 2025
c667237
Merge pull request #104 from eScienceLab/dependabot/pip/apiflask-3.0.2
douglowe Dec 1, 2025
10546a4
Bump minio from 7.2.18 to 7.2.19
dependabot[bot] Dec 1, 2025
bd80960
Merge pull request #105 from eScienceLab/dependabot/pip/minio-7.2.19
douglowe Dec 1, 2025
c87a35e
Bump redis from 6.4.0 to 7.1.0
dependabot[bot] Dec 1, 2025
51c7d75
Merge pull request #106 from eScienceLab/dependabot/pip/redis-7.1.0
douglowe Dec 1, 2025
1198f8c
Bump werkzeug from 3.1.3 to 3.1.4
dependabot[bot] Dec 1, 2025
66f75b8
Bump urllib3 from 2.5.0 to 2.6.0
dependabot[bot] Dec 6, 2025
10d5d84
Bump marshmallow from 4.0.0 to 4.1.2
dependabot[bot] Dec 22, 2025
bdf04ca
Bump celery from 5.5.3 to 5.6.2
dependabot[bot] Jan 5, 2026
3dfe24f
Merge pull request #113 from eScienceLab/dependabot/pip/celery-5.6.2
douglowe Jan 5, 2026
9babed2
Merge pull request #112 from eScienceLab/dependabot/pip/marshmallow-4…
douglowe Jan 5, 2026
61e6483
Merge pull request #111 from eScienceLab/dependabot/pip/urllib3-2.6.0
douglowe Jan 5, 2026
8eaaf6b
Merge pull request #109 from eScienceLab/dependabot/pip/werkzeug-3.1.4
douglowe Jan 5, 2026
bbade12
Bump urllib3 from 2.6.0 to 2.6.3
dependabot[bot] Jan 8, 2026
cb2bf1c
Bump werkzeug from 3.1.4 to 3.1.5
dependabot[bot] Jan 9, 2026
c61931f
rocrate-validator v0.8.0
douglowe Jan 12, 2026
2b98a2d
Merge pull request #116 from eScienceLab/roc-validator-v0.8
douglowe Jan 12, 2026
a3ff594
Merge pull request #115 from eScienceLab/dependabot/pip/werkzeug-3.1.5
douglowe Jan 12, 2026
a88a76b
Merge pull request #114 from eScienceLab/dependabot/pip/urllib3-2.6.3
douglowe Jan 12, 2026
6df8f9a
docker project specified for local integration testing
douglowe Jan 12, 2026
8a7c2ea
white space cleanup
douglowe Jan 12, 2026
c216de7
profile path optional input for rocrate validation task
douglowe Jan 12, 2026
2eaafd4
clean config class, use for celery app, add profiles_path
douglowe Jan 19, 2026
5f3f1f4
provide route for passing profiles_path to rocrate validator call
douglowe Jan 19, 2026
973cd03
update tests for profile_paths variable
douglowe Jan 19, 2026
cf02db7
switch to extra_profiles_path option for validator additional profiles
douglowe Jan 19, 2026
0db76ed
docker compose profile loading example
douglowe Jan 20, 2026
1067ff5
integration test for providing extra profile for validation
douglowe Jan 20, 2026
1cd7608
full profile directory for crate validator, not extra profiles path
douglowe Feb 9, 2026
f1181e5
profiles path (in develop) set for flask not celery worker
douglowe Feb 9, 2026
4d8ee0a
remove extraneous environment variables from dev celery worker
douglowe Feb 9, 2026
00708df
remove extraneous environment variables from main celery worker
douglowe Feb 9, 2026
6ac555e
add test profiles
douglowe Feb 9, 2026
9844596
add profile_name to API description in readme
douglowe Feb 9, 2026
b28d8c9
API and docker updates in readme
douglowe Feb 9, 2026
4723529
readme cleanup
douglowe Feb 9, 2026
fe43ffd
Bump redis from 7.1.0 to 7.1.1
dependabot[bot] Feb 16, 2026
742968c
direct metadata validation from json
douglowe Feb 17, 2026
0f62ee2
tests updated and extended for json metadata validation
douglowe Feb 17, 2026
f84bf29
removed metadata only rocrate build function
douglowe Feb 17, 2026
4d0f544
pass profiles_path env variable to metadata testing function
douglowe Feb 25, 2026
e3d4d78
docstring and logging update
douglowe Feb 25, 2026
aef6f64
add profiles_path variable to metadata api and service tests
douglowe Feb 25, 2026
15f5b3c
Bump redis from 7.1.1 to 7.2.0
dependabot[bot] Feb 23, 2026
dab78e3
Bump flask from 3.1.2 to 3.1.3
dependabot[bot] Feb 23, 2026
78639c2
Bump werkzeug from 3.1.5 to 3.1.6
dependabot[bot] Feb 26, 2026
b0bd555
Bump roc-validator from 0.8 to 0.8.1
dependabot[bot] Feb 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ This project presents a Flask-based API for validating RO-Crates.
|------------|-----------|-------------------------|-----------------------------------------------------------------------|
| root_path | optional | string | Root path which contains the RO-Crate |
| webhook_url | optional | string | Webhook to send validation result to |
| profile_name | optional | string | RO-Crate profile to validate against |
| minio_config | required | dictionary | MinIO Configuration Details |

`minio_config`
Expand Down Expand Up @@ -167,12 +168,24 @@ curl -X 'POST' \

2. Create the `.env` file for shared environment information. An example environment file is included (`example.env`), which can be copied for this purpose. But make sure to change any security settings (username and passwords).

3. Build and start the services using Docker Compose:
3. A directory containing RO-Crate profiles to replace the default RO-Crate profiles for validation may be provided. Note that this will need to contain all profile files, as the default profile data will not be used. An example of this is given in the `docker-compose-develop.yml` file, and described here:
1. Store the profiles in a convenient directory, e.g.: `./local/rocrate_validator_profiles`
2. Add a volume to the celery worker container for these, e.g.:
```
volumes:
- ./local/rocrate_validator_profiles:/app/profiles:ro
```
3. Provide the `PROFILES_PATH` environment to the flask container (not the celery worker container) to match the internal path, e.g.:
```
- PROFILES_PATH=/app/profiles
```

4. Build and start the services using Docker Compose:
```bash
docker compose up --build
```

4. Set up the MinIO bucket
5. Set up the MinIO bucket
1. Open the MinIO web interface at `http://localhost:9000`.
2. Log in with your MinIO credentials.
3. Create a new bucket named `ro-crates`.
Expand Down
11 changes: 8 additions & 3 deletions app/ro_crates/routes/post_routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from apiflask import APIBlueprint, Schema
from apiflask.fields import String, Boolean
from marshmallow.fields import Nested
from flask import Response
from flask import Response, current_app

from app.services.validation_service import (
queue_ro_crate_validation_task,
Expand Down Expand Up @@ -81,7 +81,10 @@ def validate_ro_crate_via_id(json_data, crate_id) -> tuple[Response, int]:
else:
profile_name = None

return queue_ro_crate_validation_task(minio_config, crate_id, root_path, profile_name, webhook_url)
profiles_path = current_app.config["PROFILES_PATH"]

return queue_ro_crate_validation_task(minio_config, crate_id, root_path, profile_name,
webhook_url, profiles_path)


@post_routes_bp.post("/validate_metadata")
Expand All @@ -108,4 +111,6 @@ def validate_ro_crate_metadata(json_data) -> tuple[Response, int]:
else:
profile_name = None

return queue_ro_crate_metadata_validation_task(crate_json, profile_name)
profiles_path = current_app.config["PROFILES_PATH"]

return queue_ro_crate_metadata_validation_task(crate_json, profile_name, profiles_path=profiles_path)
12 changes: 8 additions & 4 deletions app/services/validation_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@


def queue_ro_crate_validation_task(
minio_config, crate_id, root_path=None, profile_name=None, webhook_url=None
minio_config, crate_id, root_path=None, profile_name=None, webhook_url=None,
profiles_path=None
) -> tuple[Response, int]:
"""
Queues an RO-Crate for validation with Celery.
Expand All @@ -51,22 +52,24 @@ def queue_ro_crate_validation_task(
raise InvalidAPIUsage(f"No RO-Crate with prefix: {crate_id}", 400)

try:
process_validation_task_by_id.delay(minio_config, crate_id, root_path, profile_name, webhook_url)
process_validation_task_by_id.delay(minio_config, crate_id, root_path,
profile_name, webhook_url, profiles_path)
return jsonify({"message": "Validation in progress"}), 202

except Exception as e:
return jsonify({"error": str(e)}), 500


def queue_ro_crate_metadata_validation_task(
crate_json: str, profile_name=None, webhook_url=None
crate_json: str, profile_name=None, webhook_url=None, profiles_path=None
) -> tuple[Response, int]:
"""
Queues an RO-Crate for validation with Celery.

:param crate_id: The ID of the RO-Crate to validate.
:param profile_name: The profile to validate against.
:param webhook_url: The URL to POST the validation results to.
:param profiles_path: A path to the profile definition directory.
:return: A tuple containing a JSON response and an HTTP status code.
:raises: Exception: If an error occurs whilst queueing the task.
"""
Expand All @@ -88,7 +91,8 @@ def queue_ro_crate_metadata_validation_task(
result = process_validation_task_by_metadata.delay(
crate_json,
profile_name,
webhook_url
webhook_url,
profiles_path
)
if webhook_url:
return jsonify({"message": "Validation in progress"}), 202
Expand Down
69 changes: 48 additions & 21 deletions app/tasks/validation_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import logging
import os
import shutil
import json
from typing import Optional

from rocrate_validator import services
Expand All @@ -22,14 +23,14 @@
find_validation_object_on_minio
)
from app.utils.webhook_utils import send_webhook_notification
from app.utils.file_utils import build_metadata_only_rocrate

logger = logging.getLogger(__name__)


@celery.task
def process_validation_task_by_id(
minio_config: dict, crate_id: str, root_path: str, profile_name: str | None, webhook_url: str | None
minio_config: dict, crate_id: str, root_path: str, profile_name: str | None,
webhook_url: str | None, profiles_path: str | None
) -> None:
"""
Background task to process the RO-Crate validation by ID.
Expand All @@ -56,7 +57,7 @@ def process_validation_task_by_id(
logging.info(f"Processing validation task for {file_path}")

# Perform validation:
validation_result = perform_ro_crate_validation(file_path, profile_name)
validation_result = perform_ro_crate_validation(file_path, profile_name, profiles_path=profiles_path)

if isinstance(validation_result, str):
logging.error(f"Validation failed: {validation_result}")
Expand Down Expand Up @@ -97,32 +98,27 @@ def process_validation_task_by_id(

@celery.task
def process_validation_task_by_metadata(
crate_json: str, profile_name: str | None, webhook_url: str | None
crate_json: str, profile_name: str | None, webhook_url: str | None, profiles_path: Optional[str] = None
) -> ValidationResult | str:
"""
Background task to process the RO-Crate validation for a given json metadata string.

:param crate_json: A string containing the RO-Crate JSON metadata to validate.
:param profile_name: The name of the validation profile to use. Defaults to None.
:param webhook_url: The webhook URL to send notifications to. Defaults to None.
:param profiles_path: The path to the profiles definition directory. Defaults to None.
:raises Exception: If an error occurs during the validation process.

:todo: Replace the Crate ID with a more comprehensive system, and replace profile name with URI.
"""

skip_checks_list = ['ro-crate-1.1_12.1']
file_path = None

try:
# Fetch the RO-Crate from MinIO using the provided ID:
file_path = build_metadata_only_rocrate(crate_json)

logging.info(f"Processing validation task for {file_path}")
logging.info("Processing validation task for provided metadata string")

# Perform validation:
validation_result = perform_ro_crate_validation(file_path,
validation_result = perform_metadata_validation(crate_json,
profile_name,
skip_checks_list
profiles_path
)

if isinstance(validation_result, str):
Expand All @@ -131,9 +127,9 @@ def process_validation_task_by_metadata(
raise Exception(f"Validation failed: {validation_result}")

if not validation_result.has_issues():
logging.info(f"RO Crate {file_path} is valid.")
logging.info("RO Crate metadata is valid.")
else:
logging.info(f"RO Crate {file_path} is invalid.")
logging.info("RO Crate metadata is invalid.")

if webhook_url:
send_webhook_notification(webhook_url, validation_result.to_json())
Expand All @@ -147,25 +143,22 @@ def process_validation_task_by_metadata(
send_webhook_notification(webhook_url, error_data)

finally:
# Clean up the temporary file if it was created:
if file_path and os.path.exists(file_path):
shutil.rmtree(file_path)

if isinstance(validation_result, str):
return validation_result
else:
return validation_result.to_json()


def perform_ro_crate_validation(
file_path: str, profile_name: str | None, skip_checks_list: Optional[list] = None
file_path: str, profile_name: str | None, skip_checks_list: Optional[list] = None, profiles_path: Optional[str] = None
) -> ValidationResult | str:
"""
Validates an RO-Crate using the provided file path and profile name.

:param file_path: The path to the RO-Crate file to validate
:param profile_name: The name of the validation profile to use. Defaults to None. If None, the CRS4 validator will
attempt to determine the profile.
:param profiles_path: The path to the profiles definition directory
:param skip_checks_list: A list of checks to skip, if needed
:return: The validation result.
:raises Exception: If an error occurs during the validation process.
Expand All @@ -183,7 +176,41 @@ def perform_ro_crate_validation(
settings = services.ValidationSettings(
rocrate_uri=full_file_path,
**({"profile_identifier": profile_name} if profile_name else {}),
**({"skip_checks": skip_checks_list} if skip_checks_list else {})
**({"skip_checks": skip_checks_list} if skip_checks_list else {}),
**({"profiles_path": profiles_path} if profiles_path else {})
)

return services.validate(settings)

except Exception as e:
logging.error(f"Unexpected error during validation: {e}")
return str(e)


def perform_metadata_validation(
crate_json: str, profile_name: str | None, skip_checks_list: Optional[list] = None, profiles_path: Optional[str] = None
) -> ValidationResult | str:
"""
Validates only RO-Crate metadata provided as a json string.

:param crate_json: The JSON string containing the metadata
:param profile_name: The name of the validation profile to use. Defaults to None. If None, the CRS4 validator will
attempt to determine the profile.
:param profiles_path: The path to the profiles definition directory
:param skip_checks_list: A list of checks to skip, if needed
:return: The validation result.
:raises Exception: If an error occurs during the validation process.
"""

try:
logging.info(f"Validating ro-crate metadata with profile {profile_name}")

settings = services.ValidationSettings(
**({"metadata_only": True}),
**({"metadata_dict": json.loads(crate_json)}),
**({"profile_identifier": profile_name} if profile_name else {}),
**({"skip_checks": skip_checks_list} if skip_checks_list else {}),
**({"profiles_path": profiles_path} if profiles_path else {})
)

return services.validate(settings)
Expand Down
31 changes: 16 additions & 15 deletions app/utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,34 +10,32 @@
from flask import Flask


def get_env(name: str, default=None, required=False):
value = os.environ.get(name, default)
if required and value is None:
raise RuntimeError(f"Missing required environment variable: {name}")
return value


class Config:
"""Base configuration class for the Flask application."""

SECRET_KEY = os.getenv("SECRET_KEY", "my_precious")

# Celery configuration:
CELERY_BROKER_URL = os.getenv("CELERY_BROKER_URL")
CELERY_RESULT_BACKEND = os.getenv("CELERY_RESULT_BACKEND")
CELERY_BROKER_URL = get_env("CELERY_BROKER_URL", required=False)
CELERY_RESULT_BACKEND = get_env("CELERY_RESULT_BACKEND", required=False)

# MinIO configuration:
MINIO_ENDPOINT = os.getenv("MINIO_ENDPOINT")
MINIO_ACCESS_KEY = os.getenv("MINIO_ACCESS_KEY")
MINIO_SECRET_KEY = os.getenv("MINIO_SECRET_KEY")
MINIO_BUCKET_NAME = os.getenv("MINIO_BUCKET_NAME", "bucket-name")
# rocrate validator configuration:
PROFILES_PATH = get_env("PROFILES_PATH", required=False)


class DevelopmentConfig(Config):
"""Development configuration class."""

DEBUG = True
ENV = "development"


class ProductionConfig(Config):
"""Production configuration class."""

DEBUG = False
ENV = "production"


class InvalidAPIUsage(Exception):
Expand All @@ -63,10 +61,13 @@ def make_celery(app: Flask = None) -> Celery:
:param app: The Flask application to use.
:return: The Celery instance.
"""
env = os.environ.get("FLASK_ENV", "development")
config_cls = ProductionConfig if env == "production" else DevelopmentConfig

celery = Celery(
app.import_name if app else __name__,
broker=os.getenv("CELERY_BROKER_URL"),
backend=os.getenv("CELERY_RESULT_BACKEND"),
broker=config_cls.CELERY_BROKER_URL,
backend=config_cls.CELERY_RESULT_BACKEND,
)

if app:
Expand Down
53 changes: 0 additions & 53 deletions app/utils/file_utils.py

This file was deleted.

Loading