UN-3393 [FEAT] Wire AuditSerializer subclasses through SanitizedSerializerMixin#1966
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Filename | Overview |
|---|---|
| backend/backend/serializers.py | Load-bearing change: swaps AuditSerializer's parent to utils.serializer.ModelSerializer, routing ~18 write-path serializers through SanitizedSerializerMixin in one line. create/update semantics are unchanged. |
| backend/prompt_studio/prompt_studio_output_manager_v2/serializers.py | Adds html_safe_fields = ('output', 'context') to opt out PromptStudioOutputSerializer from HTML rejection; these fields store raw LLM responses and document chunks. Addresses the concern raised in a previous review thread. |
| backend/prompt_studio/prompt_studio_v2/serializers.py | Adds html_safe_fields opt-out for the four LLM prompt/output fields on ToolStudioPromptSerializer; all other CharField fields (e.g. prompt_key) are correctly left under sanitization. |
| backend/prompt_studio/prompt_studio_core_v2/serializers.py | Adds html_safe_fields for LLM prompt/output fields on CustomToolSerializer and removes the now-redundant validate_description method; CustomToolListSerializer still uses raw DRF ModelSerializer (deferred to follow-up). |
| backend/notification_v2/serializers.py | Migrates NotificationSerializer to utils.serializer.ModelSerializer; webhook URL, name, and credential fields are all safe from HTML-rejection false positives since none contain angle-bracket content in normal use. |
| backend/tags/serializers.py | Migrates TagSerializer to utils.serializer.ModelSerializer; TagParamsSerializer is intentionally left on the raw DRF Serializer (its validate_tags already enforces an alphanumeric-only pattern). |
| backend/workflow_manager/workflow_v2/serializers.py | Removes redundant validate_description; other serializers in the file remain on raw DRF ModelSerializer per the stated follow-up scope. |
| backend/api_v2/serializers.py | Removes redundant validate_description from APIDeploymentSerializer; validate_display_name (which calls validate_name_field -> validate_no_html_tags) is preserved for its whitespace-stripping and empty-value-rejection behaviour. |
Class Diagram
%%{init: {'theme': 'neutral'}}%%
classDiagram
class SanitizedSerializerMixin {
+__init__()
+appends validate_no_html_tags to every writable CharField
+skips html_safe_fields and read_only fields
}
class DRFModelSerializer["drf.ModelSerializer"] {
}
class UtilsModelSerializer["utils.serializer.ModelSerializer"] {
}
class AuditSerializer {
+create()
+update()
}
class WorkflowSerializer {
+Meta.html_safe_fields: none
+validate_workflow_name()
}
class CustomToolSerializer {
+Meta.html_safe_fields: summarize_prompt, preamble, postamble, output
+validate_tool_name()
}
class ToolStudioPromptSerializer {
+Meta.html_safe_fields: prompt, assert_prompt, assertion_failure_prompt, output
}
class PromptStudioOutputSerializer {
+Meta.html_safe_fields: output, context
}
class NotificationSerializer {
+Meta: no html_safe_fields
}
class TagSerializer {
+Meta: no html_safe_fields
}
SanitizedSerializerMixin <|-- UtilsModelSerializer
DRFModelSerializer <|-- UtilsModelSerializer
UtilsModelSerializer <|-- AuditSerializer
AuditSerializer <|-- WorkflowSerializer
AuditSerializer <|-- CustomToolSerializer
AuditSerializer <|-- ToolStudioPromptSerializer
AuditSerializer <|-- PromptStudioOutputSerializer
UtilsModelSerializer <|-- NotificationSerializer
UtilsModelSerializer <|-- TagSerializer
Reviews (3): Last reviewed commit: "UN-3393 [FIX] Opt PromptStudioOutputSeri..." | Re-trigger Greptile
93b7ecd to
1e8e4ef
Compare
Builds on the foundation PR by routing every `AuditSerializer` subclass
through `SanitizedSerializerMixin`. ~18 write-path serializers
(`WorkflowSerializer`, `CustomToolSerializer`, `APIDeploymentSerializer`,
`APIKeySerializer`, `ConnectorInstanceSerializer`, `BaseAdapterSerializer`,
`PlatformKeySerializer`, `PlatformApiKey{Create,Update}Serializer`,
`PipelineSerializer`, `ToolInstanceSerializer`, `ToolStudioPromptSerializer`,
`PromptStudioOutputSerializer`, `ProfileManagerSerializer`,
`IndexManagerSerializer`, `PromptStudioRegistry{,Info}Serializer`,
`PromptStudioDocumentManagerSerializer`) now auto-reject HTML/JS-shaped
input on every writable `CharField` / `TextField`.
- `backend/backend/serializers.py`: `AuditSerializer` now inherits
`utils.serializer.ModelSerializer` (the pre-mixed variant) instead of
`rest_framework.serializers.ModelSerializer`. `create` / `update`
semantics unchanged.
- `prompt_studio/prompt_studio_v2.ToolStudioPromptSerializer`: declares
`Meta.html_safe_fields = ("prompt", "assert_prompt",
"assertion_failure_prompt", "output")`. LLM prompt text legitimately
contains XML/HTML-like markup (e.g. `<context>`, `<thinking>`); LLM
`output` may include anything the model produced.
- `prompt_studio/prompt_studio_core_v2.CustomToolSerializer`: declares
`Meta.html_safe_fields = ("summarize_prompt", "preamble", "postamble",
"output")`. Tool-level LLM context fields and stored LLM output.
- Removes redundant manual `validate_description(self, value)` methods
in `api_v2.APIDeploymentSerializer`,
`workflow_v2.WorkflowSerializer`, and
`prompt_studio_core_v2.CustomToolSerializer`. The mixin now covers
these via the `AuditSerializer` base. `validate_<name>` methods that
use `validate_name_field` are retained because they also strip
whitespace and reject empty values — behaviour the mixin doesn't
duplicate.
- Imports of `validate_no_html_tags` are dropped from the three files
whose `validate_description` methods were removed.
Files that inherit DRF base classes directly (without going through
`AuditSerializer`) are tracked as a follow-up sweep. The AuditSerializer
path already covers the highest-value write-path entities.
Test coverage: `cd backend && uv run pytest utils/tests/ -q` → 85 passed.
Full Django check requires cloud-deps; PR CI exercises the full suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends PR2 coverage to two user-write-path serializers that don't go through `AuditSerializer`: - `tags.TagSerializer` (user-create with `name` + `description`) now inherits `utils.serializer.ModelSerializer`. Sister `TagParamsSerializer` is a query-param parser with strict regex validation; left as-is. - `notification_v2.NotificationSerializer` now inherits `utils.serializer.ModelSerializer`. The model's `name`, `authorization_key`, `authorization_header`, and `url` (URLField is a CharField subclass) are auto-sanitized. `notification_type`, `authorization_type`, `platform` are ChoiceField in the serializer, not CharField; the mixin skips them. The existing manual `validate_name` keeps its uniqueness + strip-whitespace logic; the mixin's HTML check runs alongside as redundant defence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile P1 review comment on PR #1966. `PromptStudioOutputManager` stores raw LLM responses (`output`: CharField) and document chunks (`context`: TextField) that routinely contain <thinking>, <context>, and other XML-like tags. The serializer is mounted on a ModelViewSet with no class-level HTTP-method restriction, so write paths through DRF admin / browsable API / any future write endpoint would incorrectly reject legitimate LLM output without this opt-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1e8e4ef to
6ce8ea8
Compare
|



What
AuditSerializersubclass throughSanitizedSerializerMixin. ~18 write-path serializers now auto-reject HTML/JS-shaped input on every writableCharField/TextField.WorkflowSerializer,CustomToolSerializer,APIDeploymentSerializer,APIKeySerializer,ConnectorInstanceSerializer,BaseAdapterSerializer,PlatformKeySerializer,PlatformApiKey{Create,Update}Serializer,PipelineSerializer,ToolInstanceSerializer,ToolStudioPromptSerializer,PromptStudioOutputSerializer,ProfileManagerSerializer,IndexManagerSerializer,PromptStudioRegistry{,Info}Serializer,PromptStudioDocumentManagerSerializer.Meta.html_safe_fieldsopt-outs on the two serializers that legitimately carry LLM markup:ToolStudioPromptSerializer→prompt,assert_prompt,assertion_failure_prompt,output.CustomToolSerializer→summarize_prompt,preamble,postamble,output.validate_descriptionmethods (inapi_v2.APIDeploymentSerializer,workflow_v2.WorkflowSerializer,prompt_studio_core_v2.CustomToolSerializer) — the mixin now covers them.Why
SanitizedSerializerMixinand pre-mixedModelSerializer/Serializer/HyperlinkedModelSerializerunderutils.serializerbut did not wire any production serializer through them. This PR is the smallest, highest-leverage wiring step: changing one parent class (AuditSerializer) closes the input-validation gap on the majority of write-path entities.validate_<name>methods (which usevalidate_name_field) are kept because they also strip whitespace and reject empty values — behaviour the mixin doesn't duplicate. Only thevalidate_descriptionmethods (which just callvalidate_no_html_tags) are removed.prompt,output,summarize_prompt, etc.) legitimately contain<context>/<thinking>/ arbitrary model output. They're explicitly opted out viaMeta.html_safe_fields.How
backend/backend/serializers.py:AuditSerializernow inheritsutils.serializer.ModelSerializerinstead ofrest_framework.serializers.ModelSerializer.create/updatesemantics unchanged.Meta.html_safe_fieldsdeclarations on the two LLM-bearing serializers (Prompt Studio).validate_descriptionremovals +validate_no_html_tagsimport drop from the affected modules.Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
validate_<field>methods still exist — the mixin's validator is appended to the field'svalidatorslist, so the same rejection is enforced either way. Removing only the redundantvalidate_descriptionmethods avoids double-validation noise but doesn't change any acceptance outcome.AuditSerializersubclasses that were not previously sanitised now auto-reject HTML in theirCharField/TextField(e.g.ConnectorInstance.connector_name,PlatformKey.key_name,Pipeline.pipeline_name,AdapterInstance.adapter_name, etc.). Per the May 2026 US-prod scan (see KBObsidian Vault/zipstuff/UN-3393-input-validation/05-prod-baseline.md), zero legitimate user content uses<…>in these columns — every match was pentest leftover. Migration risk: ~zero.prompt,assert_prompt,assertion_failure_prompt,output,summarize_prompt,preamble,postambleare listed inMeta.html_safe_fieldsso they continue to accept arbitrary text.serializers.ModelSerializer/serializers.Serializer/serializers.HyperlinkedModelSerializer(i.e. not viaAuditSerializer) are not touched by this PR. They remain unchanged. A follow-up sweep will convert them; their absence here means write paths through them retain pre-PR behaviour.Database Migrations
Env Config
Relevant Docs
Obsidian Vault/zipstuff/UN-3393-input-validation/.adr/0003-base-mixin-default-on-opt-out.md) — opt-out semantics.Related Issues or PRs
tags/serializers.py,scheduler/serializer.py,tenant_account_v2/serializer.py,pipeline_v2/serializers/{internal,update,sharing,execute}.py, response-only and list serializers across the codebase, …).backend/pluggable_apps/(lives in theunstract-cloudrepo; separate PR there).Dependencies Versions
Notes on Testing
Local tests in the OSS worktree (cloud-deps not installed, so
manage.py checkis unavailable here — CI runs the full suite):→ 85 passed (50 from foundation + 35 pre-existing utils tests).
Manual sanity:
Screenshots
n/a (no UI change)
Checklist
I have read and understood the Contribution Guidelines.