UN-3393 [FEAT] Wire AuditSerializer subclasses through SanitizedSerializerMixin by chandrasekharan-zipstack · Pull Request #1966 · Zipstack/unstract

chandrasekharan-zipstack · 2026-05-14T07:28:17Z

What

Builds on UN-3393 [FEAT] Add SanitizedSerializerMixin foundation for input validation #1965 (foundation PR) by routing every AuditSerializer subclass through SanitizedSerializerMixin. ~18 write-path serializers now auto-reject HTML/JS-shaped input on every writable CharField / TextField.
Affected (transitively): WorkflowSerializer, CustomToolSerializer, APIDeploymentSerializer, APIKeySerializer, ConnectorInstanceSerializer, BaseAdapterSerializer, PlatformKeySerializer, PlatformApiKey{Create,Update}Serializer, PipelineSerializer, ToolInstanceSerializer, ToolStudioPromptSerializer, PromptStudioOutputSerializer, ProfileManagerSerializer, IndexManagerSerializer, PromptStudioRegistry{,Info}Serializer, PromptStudioDocumentManagerSerializer.
Adds Meta.html_safe_fields opt-outs on the two serializers that legitimately carry LLM markup:
- ToolStudioPromptSerializer → prompt, assert_prompt, assertion_failure_prompt, output.
- CustomToolSerializer → summarize_prompt, preamble, postamble, output.
Removes the three redundant validate_description methods (in api_v2.APIDeploymentSerializer, workflow_v2.WorkflowSerializer, prompt_studio_core_v2.CustomToolSerializer) — the mixin now covers them.

Why

The foundation PR (UN-3393 [FEAT] Add SanitizedSerializerMixin foundation for input validation #1965) added SanitizedSerializerMixin and pre-mixed ModelSerializer / Serializer / HyperlinkedModelSerializer under utils.serializer but did not wire any production serializer through them. This PR is the smallest, highest-leverage wiring step: changing one parent class (AuditSerializer) closes the input-validation gap on the majority of write-path entities.
validate_<name> methods (which use validate_name_field) are kept because they also strip whitespace and reject empty values — behaviour the mixin doesn't duplicate. Only the validate_description methods (which just call validate_no_html_tags) are removed.
Free-form LLM prompt / output fields (prompt, output, summarize_prompt, etc.) legitimately contain <context> / <thinking> / arbitrary model output. They're explicitly opted out via Meta.html_safe_fields.

How

backend/backend/serializers.py: AuditSerializer now inherits utils.serializer.ModelSerializer instead of rest_framework.serializers.ModelSerializer. create / update semantics unchanged.
Per-serializer Meta.html_safe_fields declarations on the two LLM-bearing serializers (Prompt Studio).
Three validate_description removals + validate_no_html_tags import drop from the affected modules.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Functional: The 8 serializers that were already hand-wired (in the in-flight fix) continue to behave identically. The 14 manual validate_<field> methods still exist — the mixin's validator is appended to the field's validators list, so the same rejection is enforced either way. Removing only the redundant validate_description methods avoids double-validation noise but doesn't change any acceptance outcome.
New rejections: AuditSerializer subclasses that were not previously sanitised now auto-reject HTML in their CharField / TextField (e.g. ConnectorInstance.connector_name, PlatformKey.key_name, Pipeline.pipeline_name, AdapterInstance.adapter_name, etc.). Per the May 2026 US-prod scan (see KB Obsidian Vault/zipstuff/UN-3393-input-validation/05-prod-baseline.md), zero legitimate user content uses <…> in these columns — every match was pentest leftover. Migration risk: ~zero.
LLM fields explicitly opted out: prompt, assert_prompt, assertion_failure_prompt, output, summarize_prompt, preamble, postamble are listed in Meta.html_safe_fields so they continue to accept arbitrary text.
Direct DRF-base-class users: Serializers that inherit directly from serializers.ModelSerializer / serializers.Serializer / serializers.HyperlinkedModelSerializer (i.e. not via AuditSerializer) are not touched by this PR. They remain unchanged. A follow-up sweep will convert them; their absence here means write paths through them retain pre-PR behaviour.

Database Migrations

None.

Env Config

None.

Relevant Docs

KB: Obsidian Vault/zipstuff/UN-3393-input-validation/.
ADR 0003 (adr/0003-base-mixin-default-on-opt-out.md) — opt-out semantics.

Related Issues or PRs

Jira: UN-3393.
Stacked on: UN-3393 [FEAT] Add SanitizedSerializerMixin foundation for input validation #1965 (foundation).
Follow-up: a separate PR that converts the remaining ~20 direct DRF-base-class users (tags/serializers.py, scheduler/serializer.py, tenant_account_v2/serializer.py, pipeline_v2/serializers/{internal,update,sharing,execute}.py, response-only and list serializers across the codebase, …).
Follow-up: cover backend/pluggable_apps/ (lives in the unstract-cloud repo; separate PR there).

Dependencies Versions

None.

Notes on Testing

Local tests in the OSS worktree (cloud-deps not installed, so manage.py check is unavailable here — CI runs the full suite):

cd backend && uv run pytest utils/tests/ -q

→ 85 passed (50 from foundation + 35 pre-existing utils tests).

Manual sanity:

cd backend && uv run python -c "from utils.serializer import ModelSerializer; from backend.serializers import AuditSerializer; assert issubclass(AuditSerializer, ModelSerializer); print('ok')"

Screenshots

n/a (no UI change)

Checklist

I have read and understood the Contribution Guidelines.

coderabbitai · 2026-05-14T07:28:23Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0334b9a5-0efb-4edb-9e99-5d477bf5dbab

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/UN-3393-input-validation-sweep

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-05-14T07:33:01Z

Greptile Summary

This PR wires all AuditSerializer subclasses through SanitizedSerializerMixin by changing AuditSerializer's parent class from rest_framework.serializers.ModelSerializer to utils.serializer.ModelSerializer, which already bundles the mixin. Three redundant validate_description methods that were pure wrappers around validate_no_html_tags are removed, and two additional serializers (NotificationSerializer, TagSerializer) that don't extend AuditSerializer are also migrated directly.

backend/backend/serializers.py is the load-bearing change: swapping one import flips sanitization on for ~18 write-path serializers at once.
html_safe_fields opt-outs are declared on ToolStudioPromptSerializer, CustomToolSerializer, and PromptStudioOutputSerializer to allow legitimate LLM markup in prompt/output fields.
Three validate_description removals eliminate double-validation, since the mixin already covers those fields transitively.

Confidence Score: 5/5

Safe to merge — the core change is a single-line parent-class swap on AuditSerializer, all LLM-bearing fields are correctly opted out via html_safe_fields, and the removed validate_description methods were pure pass-throughs that are now covered transitively.

The change is structurally minimal: one import swap propagates sanitization to ~18 serializers without touching their logic. The html_safe_fields opt-outs on ToolStudioPromptSerializer, CustomToolSerializer, and PromptStudioOutputSerializer cover every identified LLM content field. The PromptStudioRegistry.name/description fields are editable=False in the model, so DRF marks them read_only and the mixin skips them automatically. No write-path regressions were found.

No files require special attention. The prompt_studio_core_v2/serializers.py file retains CustomToolListSerializer on raw DRF ModelSerializer, but this is explicitly deferred to a follow-up PR and does not affect the security boundary of the write path covered here.

Important Files Changed

Filename	Overview
backend/backend/serializers.py	Load-bearing change: swaps AuditSerializer's parent to utils.serializer.ModelSerializer, routing ~18 write-path serializers through SanitizedSerializerMixin in one line. create/update semantics are unchanged.
backend/prompt_studio/prompt_studio_output_manager_v2/serializers.py	Adds html_safe_fields = ('output', 'context') to opt out PromptStudioOutputSerializer from HTML rejection; these fields store raw LLM responses and document chunks. Addresses the concern raised in a previous review thread.
backend/prompt_studio/prompt_studio_v2/serializers.py	Adds html_safe_fields opt-out for the four LLM prompt/output fields on ToolStudioPromptSerializer; all other CharField fields (e.g. prompt_key) are correctly left under sanitization.
backend/prompt_studio/prompt_studio_core_v2/serializers.py	Adds html_safe_fields for LLM prompt/output fields on CustomToolSerializer and removes the now-redundant validate_description method; CustomToolListSerializer still uses raw DRF ModelSerializer (deferred to follow-up).
backend/notification_v2/serializers.py	Migrates NotificationSerializer to utils.serializer.ModelSerializer; webhook URL, name, and credential fields are all safe from HTML-rejection false positives since none contain angle-bracket content in normal use.
backend/tags/serializers.py	Migrates TagSerializer to utils.serializer.ModelSerializer; TagParamsSerializer is intentionally left on the raw DRF Serializer (its validate_tags already enforces an alphanumeric-only pattern).
backend/workflow_manager/workflow_v2/serializers.py	Removes redundant validate_description; other serializers in the file remain on raw DRF ModelSerializer per the stated follow-up scope.
backend/api_v2/serializers.py	Removes redundant validate_description from APIDeploymentSerializer; validate_display_name (which calls validate_name_field -> validate_no_html_tags) is preserved for its whitespace-stripping and empty-value-rejection behaviour.

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class SanitizedSerializerMixin {
        +__init__()
        +appends validate_no_html_tags to every writable CharField
        +skips html_safe_fields and read_only fields
    }
    class DRFModelSerializer["drf.ModelSerializer"] {
    }
    class UtilsModelSerializer["utils.serializer.ModelSerializer"] {
    }
    class AuditSerializer {
        +create()
        +update()
    }
    class WorkflowSerializer {
        +Meta.html_safe_fields: none
        +validate_workflow_name()
    }
    class CustomToolSerializer {
        +Meta.html_safe_fields: summarize_prompt, preamble, postamble, output
        +validate_tool_name()
    }
    class ToolStudioPromptSerializer {
        +Meta.html_safe_fields: prompt, assert_prompt, assertion_failure_prompt, output
    }
    class PromptStudioOutputSerializer {
        +Meta.html_safe_fields: output, context
    }
    class NotificationSerializer {
        +Meta: no html_safe_fields
    }
    class TagSerializer {
        +Meta: no html_safe_fields
    }

    SanitizedSerializerMixin <|-- UtilsModelSerializer
    DRFModelSerializer <|-- UtilsModelSerializer
    UtilsModelSerializer <|-- AuditSerializer
    AuditSerializer <|-- WorkflowSerializer
    AuditSerializer <|-- CustomToolSerializer
    AuditSerializer <|-- ToolStudioPromptSerializer
    AuditSerializer <|-- PromptStudioOutputSerializer
    UtilsModelSerializer <|-- NotificationSerializer
    UtilsModelSerializer <|-- TagSerializer

_{Reviews (3): Last reviewed commit: "UN-3393 [FIX] Opt PromptStudioOutputSeri..." | Re-trigger Greptile}

Builds on the foundation PR by routing every `AuditSerializer` subclass through `SanitizedSerializerMixin`. ~18 write-path serializers (`WorkflowSerializer`, `CustomToolSerializer`, `APIDeploymentSerializer`, `APIKeySerializer`, `ConnectorInstanceSerializer`, `BaseAdapterSerializer`, `PlatformKeySerializer`, `PlatformApiKey{Create,Update}Serializer`, `PipelineSerializer`, `ToolInstanceSerializer`, `ToolStudioPromptSerializer`, `PromptStudioOutputSerializer`, `ProfileManagerSerializer`, `IndexManagerSerializer`, `PromptStudioRegistry{,Info}Serializer`, `PromptStudioDocumentManagerSerializer`) now auto-reject HTML/JS-shaped input on every writable `CharField` / `TextField`. - `backend/backend/serializers.py`: `AuditSerializer` now inherits `utils.serializer.ModelSerializer` (the pre-mixed variant) instead of `rest_framework.serializers.ModelSerializer`. `create` / `update` semantics unchanged. - `prompt_studio/prompt_studio_v2.ToolStudioPromptSerializer`: declares `Meta.html_safe_fields = ("prompt", "assert_prompt", "assertion_failure_prompt", "output")`. LLM prompt text legitimately contains XML/HTML-like markup (e.g. `<context>`, `<thinking>`); LLM `output` may include anything the model produced. - `prompt_studio/prompt_studio_core_v2.CustomToolSerializer`: declares `Meta.html_safe_fields = ("summarize_prompt", "preamble", "postamble", "output")`. Tool-level LLM context fields and stored LLM output. - Removes redundant manual `validate_description(self, value)` methods in `api_v2.APIDeploymentSerializer`, `workflow_v2.WorkflowSerializer`, and `prompt_studio_core_v2.CustomToolSerializer`. The mixin now covers these via the `AuditSerializer` base. `validate_<name>` methods that use `validate_name_field` are retained because they also strip whitespace and reject empty values — behaviour the mixin doesn't duplicate. - Imports of `validate_no_html_tags` are dropped from the three files whose `validate_description` methods were removed. Files that inherit DRF base classes directly (without going through `AuditSerializer`) are tracked as a follow-up sweep. The AuditSerializer path already covers the highest-value write-path entities. Test coverage: `cd backend && uv run pytest utils/tests/ -q` → 85 passed. Full Django check requires cloud-deps; PR CI exercises the full suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends PR2 coverage to two user-write-path serializers that don't go through `AuditSerializer`: - `tags.TagSerializer` (user-create with `name` + `description`) now inherits `utils.serializer.ModelSerializer`. Sister `TagParamsSerializer` is a query-param parser with strict regex validation; left as-is. - `notification_v2.NotificationSerializer` now inherits `utils.serializer.ModelSerializer`. The model's `name`, `authorization_key`, `authorization_header`, and `url` (URLField is a CharField subclass) are auto-sanitized. `notification_type`, `authorization_type`, `platform` are ChoiceField in the serializer, not CharField; the mixin skips them. The existing manual `validate_name` keeps its uniqueness + strip-whitespace logic; the mixin's HTML check runs alongside as redundant defence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Greptile P1 review comment on PR #1966. `PromptStudioOutputManager` stores raw LLM responses (`output`: CharField) and document chunks (`context`: TextField) that routinely contain <thinking>, <context>, and other XML-like tags. The serializer is mounted on a ModelViewSet with no class-level HTTP-method restriction, so write paths through DRF admin / browsable API / any future write endpoint would incorrectly reject legitimate LLM output without this opt-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-05-15T11:47:27Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

Comment thread backend/backend/serializers.py

chandrasekharan-zipstack force-pushed the feat/UN-3393-input-validation-sweep branch from 93b7ecd to 1e8e4ef Compare May 15, 2026 10:43

chandrasekharan-zipstack and others added 3 commits May 15, 2026 17:15

chandrasekharan-zipstack force-pushed the feat/UN-3393-input-validation-sweep branch from 1e8e4ef to 6ce8ea8 Compare May 15, 2026 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UN-3393 [FEAT] Wire AuditSerializer subclasses through SanitizedSerializerMixin#1966

UN-3393 [FEAT] Wire AuditSerializer subclasses through SanitizedSerializerMixin#1966
chandrasekharan-zipstack wants to merge 3 commits into
feat/UN-3393-input-validation-foundationfrom
feat/UN-3393-input-validation-sweep

chandrasekharan-zipstack commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Review skipped

Uh oh!

greptile-apps Bot commented May 14, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Class Diagram

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chandrasekharan-zipstack commented May 14, 2026

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

greptile-apps Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Class Diagram

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 15, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 14, 2026 •

edited

Loading

greptile-apps Bot commented May 14, 2026 •

edited

Loading