[SECURITY] Add input validation and CSP headers by hari-kuriakose · Pull Request #1834 · Zipstack/unstract

hari-kuriakose · 2026-03-06T11:18:22Z

What

Add server-side input validation for user-facing text fields to reject HTML/script injection (CWE-20)
Add Content-Security-Policy (CSP) response headers on both the Django API backend and the frontend nginx server
Added ContentSecurityPolicyMiddleware to the middleware stack

Why

Without server-side validation, stored XSS payloads could persist in the database and be exploited in non-React contexts (emails, PDF exports, logs) or if future code introduces raw HTML rendering
Without CSP, browsers have no policy to restrict which resources can load, increasing XSS and data injection attack surface

How

Input validation (backend/utils/input_sanitizer.py):

Created centralized validate_name_field() and validate_no_html_tags() validators that reject HTML tags, javascript: protocols, and event handler attributes (onclick=, etc.)
Added validate_<field> methods to serializers for all user-facing name/identifier and description fields:
- APIDeploymentSerializer — display_name, description
- WorkflowSerializer — workflow_name, description
- BaseAdapterSerializer — adapter_name, description (via validate() override)
- ConnectorInstanceSerializer — connector_name
- NotificationSerializer — name (enhanced existing validator)
- CustomToolSerializer — tool_name, description
- OrganizationSignupSerializer — name, display_name
Prompt content fields (prompt, preamble, postamble, summarize_prompt) are intentionally not validated — they must accept arbitrary text including code snippets for LLM extraction workflows

CSP headers:

Django middleware (backend/middleware/content_security_policy.py): Strict policy for the JSON API backend — all directives set to 'self', frame-ancestors 'none'. Added middleware.content_security_policy.ContentSecurityPolicyMiddleware to the PRODUCTION_MIDDLEWARE list.
Nginx (frontend/nginx.conf): Policy tailored to the React SPA's third-party dependencies — Monaco Editor (cdn.jsdelivr.net), PDF.js (unpkg.com), PostHog, Google Tag Manager, reCAPTCHA, Stripe, Product Fruits, and WebSocket connections for Socket.IO

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Input validation: Could reject names/descriptions that contain < and > characters (e.g. My Workflow <v2>). Normal names with spaces, hyphens, underscores, parentheses, periods, and other punctuation are unaffected. Prompt fields are not validated.
CSP headers: If any third-party script/resource was missed in the allowlist, it will be blocked by the browser. The frontend policy was built by auditing all dependencies — PostHog, GTM, reCAPTCHA, Stripe, Product Fruits, Monaco Editor, PDF.js, and Socket.IO WebSockets are all covered.

Database Migrations

None

Relevant Docs

Related Issues or PRs

Dependencies Versions

No new dependencies

Dependencies Versions

No new dependencies

Notes on Testing

22 unit tests added in backend/utils/tests/test_input_sanitizer.py covering clean input, HTML tags, script tags, JS protocols, event handlers, whitespace stripping, and empty string rejection
Run: cd backend && python -m pytest utils/tests/test_input_sanitizer.py -v
Manual verification: attempt to create a workflow/adapter/connector with name <script>alert(1)</script> via API — should return 400
Manual verification: check Content-Security-Policy header present in both API and frontend responses

Screenshots

N/A — backend/infrastructure changes only

Checklist

I have read and understood the Contribution Guidelines.

- Add server-side HTML/script injection validation for name and description fields across all user-facing serializers (CWE-20) - Add Content-Security-Policy header via Django middleware (API) and nginx config (frontend) to mitigate XSS and data injection attacks - Change SESSION_COOKIE_SECURE and CSRF_COOKIE_SECURE defaults to True so cookies are only sent over HTTPS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-06T11:18:38Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds regex-based input sanitizers and integrates them as field validators across multiple serializers; introduces a Content-Security-Policy middleware and registers it; and adds a CSP response header to the frontend nginx configuration.

Changes

Cohort / File(s)	Summary
Input Sanitization Utilities & Tests `backend/utils/input_sanitizer.py`, `backend/utils/tests/test_input_sanitizer.py`	Add `validate_no_html_tags` and `validate_name_field` with regex checks for HTML tags, `javascript:` URIs, and `on*` event attributes; add comprehensive unit tests.
Account & Notification Serializers `backend/account_v2/serializer.py`, `backend/notification_v2/serializers.py`	Add `validate_name` / `validate_display_name` methods that call `validate_name_field` to sanitize organization and notification name fields.
Connector, Adapter, API, Prompt Studio, Workflow Serializers `backend/connector_v2/serializers.py`, `backend/adapter_processor_v2/serializers.py`, `backend/api_v2/serializers.py`, `backend/prompt_studio/prompt_studio_core_v2/serializers.py`, `backend/workflow_manager/workflow_v2/serializers.py`	Introduce field-level `validate_*` methods delegating to `validate_name_field` and/or `validate_no_html_tags`. `BaseAdapterSerializer.validate` post-processes `adapter_name` and `description` and writes sanitized values back to validated data.
CSP Middleware & Settings `backend/middleware/content_security_policy.py`, `backend/backend/settings/base.py`	Add `ContentSecurityPolicyMiddleware` that sets a restrictive `Content-Security-Policy` response header and register it in Django `MIDDLEWARE`.
Frontend Nginx CSP `frontend/nginx.conf`	Add `add_header ... Content-Security-Policy` directive with detailed directives and third-party origin allowances.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Nginx
  participant Django
  participant Serializer
  participant Sanitizer
  participant DB

  Client->>Nginx: HTTP request
  Nginx->>Django: forward request
  Django->>Serializer: deserialize & validate input
  Serializer->>Sanitizer: validate_name_field / validate_no_html_tags
  Sanitizer-->>Serializer: sanitized value or ValidationError
  alt validation passes
    Serializer->>DB: persist data
    DB-->>Django: saved
    Django->>Django: ContentSecurityPolicyMiddleware.process_response
    Django-->>Nginx: HTTP response (with CSP header)
    Nginx-->>Client: response
  else validation fails
    Serializer-->>Django: ValidationError
    Django->>Django: ContentSecurityPolicyMiddleware.process_response
    Django-->>Nginx: error response (with CSP header)
    Nginx-->>Client: error
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.90% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[SECURITY] Add input validation and CSP headers' directly and clearly summarizes the main changes: input validation (from multiple validator additions across serializers) and CSP headers (from new middleware and nginx configuration).
Description check	✅ Passed	The PR description comprehensively covers all required template sections: What (input validation & CSP), Why (XSS/injection prevention), How (implementation details), impact assessment, database migrations (none), documentation references, testing notes, and a completed checklist.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch security/input-validation-csp-cookie-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

backend/utils/tests/test_input_sanitizer.py (1)

7-69: Add a regression test for benign connection=ok input.

This will protect against false positives in event-handler detection.

Test addition

 class TestValidateNoHtmlTags:
@@
     def test_rejects_event_handler_case_insensitive(self):
         with pytest.raises(
             ValidationError, match="must not contain event handler attributes"
         ):
             validate_no_html_tags("ONLOAD=alert(1)")
+
+    def test_allows_benign_assignment_text(self):
+        assert validate_no_html_tags("connection=ok") == "connection=ok"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/utils/tests/test_input_sanitizer.py` around lines 7 - 69, The tests
are missing a regression case for benign strings like "connection=ok" that could
be mis-detected as an event-handler; add a new test method in
TestValidateNoHtmlTags (e.g., test_allows_connection_equals_ok) that asserts
validate_no_html_tags("connection=ok") returns "connection=ok" (no
ValidationError), ensuring the event-handler detection in validate_no_html_tags
doesn't yield false positives for this pattern.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/adapter_processor_v2/serializers.py`:
- Around line 32-43: The validate method in the serializer overrides parent
validation and omits calling super(), so parent logic in AuditSerializer is
bypassed; update the validate(self, data) implementation to first call validated
= super().validate(data) (or merge with returned value) and then apply the
adapter-specific checks (validate_name_field and validate_no_html_tags) on that
validated dict, finally returning the validated dict so the AuditSerializer
validations are preserved.

In `@backend/api_v2/serializers.py`:
- Around line 69-70: The validate_description method can raise TypeError if
value is None because validate_no_html_tags expects a str; update
validate_description (in the same serializer) to guard against None the same way
BaseAdapterSerializer.validate does — e.g., if value is None return value or
coerce to "" before calling validate_no_html_tags — then call
validate_no_html_tags(value, field_name="Description") so validate_no_html_tags
never receives None.

In `@backend/middleware/content_security_policy.py`:
- Around line 17-27: Change the unconditional header assignment in the
middleware so it doesn't clobber any existing route-specific CSP: instead of
directly setting response["Content-Security-Policy"], use
response.setdefault("Content-Security-Policy", <policy string>) so the default
policy (the same multi-line string currently assigned) is applied only when the
header is not already present; update the code in content_security_policy.py
where the header is set to use response.setdefault to preserve any prior CSP set
by views or earlier middleware.

In `@backend/utils/input_sanitizer.py`:
- Around line 10-20: EVENT_HANDLER_PATTERN is too broad and matches substrings
inside benign words (e.g., "connection=..."); update the regex used by
validate_no_html_tags to only match true HTML attribute contexts by changing
EVENT_HANDLER_PATTERN to require a word boundary or a preceding '<' or
whitespace before the attribute name (for example use a pattern like a
lookbehind for whitespace or '<' followed by "on" + letters and "=" with
re.IGNORECASE) so it no longer flags tokens like "connection=" while still
catching real event-handler attributes referenced in validate_no_html_tags.

---

Nitpick comments:
In `@backend/utils/tests/test_input_sanitizer.py`:
- Around line 7-69: The tests are missing a regression case for benign strings
like "connection=ok" that could be mis-detected as an event-handler; add a new
test method in TestValidateNoHtmlTags (e.g., test_allows_connection_equals_ok)
that asserts validate_no_html_tags("connection=ok") returns "connection=ok" (no
ValidationError), ensuring the event-handler detection in validate_no_html_tags
doesn't yield false positives for this pattern.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2d27dfb1-b259-4452-adef-342aba0d383a

📥 Commits

Reviewing files that changed from the base of the PR and between 076d7c1 and 35f6eaf.

📒 Files selected for processing (13)

backend/account_v2/serializer.py
backend/adapter_processor_v2/serializers.py
backend/api_v2/serializers.py
backend/backend/settings/base.py
backend/connector_v2/serializers.py
backend/middleware/content_security_policy.py
backend/notification_v2/serializers.py
backend/prompt_studio/prompt_studio_core_v2/serializers.py
backend/utils/input_sanitizer.py
backend/utils/tests/__init__.py
backend/utils/tests/test_input_sanitizer.py
backend/workflow_manager/workflow_v2/serializers.py
frontend/nginx.conf

backend/adapter_processor_v2/serializers.py

backend/api_v2/serializers.py

backend/middleware/content_security_policy.py

backend/utils/input_sanitizer.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>

for more information, see https://pre-commit.ci

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

backend/utils/input_sanitizer.py (1)
10-20: ⚠️ Potential issue | 🟠 Major

Narrow EVENT_HANDLER_PATTERN further.

This still rejects benign on... = text such as oncall = primary or onboarding = enabled, so valid names/descriptions can fail across every serializer using this helper. Please constrain the match to actual HTML attribute contexts or a vetted set of real DOM event names instead of any on\w+= token.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/input_sanitizer.py` around lines 10 - 20, EVENT_HANDLER_PATTERN
is too broad and rejects benign tokens like "oncall" or "onboarding"; narrow it
by matching only real DOM event names or actual HTML attribute contexts. Update
EVENT_HANDLER_PATTERN (used by validate_no_html_tags) to either (a) match a
vetted list of event names (e.g., click, change, load, submit, mouseover,
keydown, input, focus, blur, etc.) as anchors like on(click|change|...) followed
by optional whitespace and '=', or (b) require HTML attribute context by
ensuring the token appears as an attribute (e.g., preceded by '<tag ' or
whitespace within a tag) before the '='; replace the existing pattern with one
of these stricter patterns and add a unit test demonstrating that "oncall =
primary" and "onboarding = enabled" pass while real event attributes like
"onclick=" are still rejected.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/middleware/content_security_policy.py`:
- Around line 6-22: The CSP middleware's process_response currently sets
"script-src 'self'" and "style-src 'self'", which blocks the inline <script>
used by the login page; update the header value set in process_response to
permit inline scripts/styles (e.g., add 'unsafe-inline' to both script-src and
style-src) or implement a nonce-based CSP and inject matching nonces into the
login template rendered by authentication_service.render; also update the
middleware docstring to remove the incorrect "JSON API backend" claim. Use the
process_response function and the "Content-Security-Policy" header string as the
change points, or alternatively refactor backend/account_v2/templates/login.html
to use external JS and then keep the stricter CSP.

---

Duplicate comments:
In `@backend/utils/input_sanitizer.py`:
- Around line 10-20: EVENT_HANDLER_PATTERN is too broad and rejects benign
tokens like "oncall" or "onboarding"; narrow it by matching only real DOM event
names or actual HTML attribute contexts. Update EVENT_HANDLER_PATTERN (used by
validate_no_html_tags) to either (a) match a vetted list of event names (e.g.,
click, change, load, submit, mouseover, keydown, input, focus, blur, etc.) as
anchors like on(click|change|...) followed by optional whitespace and '=', or
(b) require HTML attribute context by ensuring the token appears as an attribute
(e.g., preceded by '<tag ' or whitespace within a tag) before the '='; replace
the existing pattern with one of these stricter patterns and add a unit test
demonstrating that "oncall = primary" and "onboarding = enabled" pass while real
event attributes like "onclick=" are still rejected.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 20bde572-2bc1-45f4-a2ac-5ffc1c2e7e5f

📥 Commits

Reviewing files that changed from the base of the PR and between 35f6eaf and 9a62502.

📒 Files selected for processing (4)

backend/adapter_processor_v2/serializers.py
backend/api_v2/serializers.py
backend/middleware/content_security_policy.py
backend/utils/input_sanitizer.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/api_v2/serializers.py
backend/adapter_processor_v2/serializers.py

backend/middleware/content_security_policy.py

frontend/nginx.conf

greptile-apps · 2026-03-18T07:01:13Z

Greptile Summary

This PR adds two complementary layers of defence against XSS and injection attacks: server-side input sanitization on all user-facing name/description fields across 7 serializers, and Content-Security-Policy headers on both the Django API backend (via new middleware) and the React SPA nginx server.

Key changes:

backend/utils/input_sanitizer.py — New centralized module with three pattern checks (HTML tags, dangerous URI protocols, DOM event handlers); used consistently across all affected serializers
backend/middleware/content_security_policy.py — Strict backend CSP using a SHA-256 hash for the login.html inline script instead of unsafe-inline; applied globally via base.py middleware registration
frontend/nginx.conf — Frontend CSP with explicit third-party allowlists (Monaco, PDF.js, PostHog, Stripe, GTM, reCAPTCHA, Product Fruits); unsafe-inline retained in script-src for container-injected runtime-config.js
22 unit tests covering clean input, injections, false-positive scenarios, and edge cases

Issues found:

BaseAdapterSerializer.validate() raises validation errors at the object level, so failures on adapter_name and description appear under non_field_errors in the API response rather than being tied to the specific failing field — inconsistent with all other serializers in this PR
_DOM_EVENTS in input_sanitizer.py is missing several modern events (animationend, transitionend, message, hashchange, etc.) that are commonly exploited in XSS bypass payloads; the primary HTML_TAG_PATTERN still catches the surrounding tag, but the defence-in-depth coverage is incomplete
The hash-regeneration command in ContentSecurityPolicyMiddleware is truncated (...), leaving future maintainers unable to update the hash when login.html changes

Confidence Score: 3/5

Safe to merge after fixing the object-level validation error placement in BaseAdapterSerializer, which breaks the API contract for adapter creation/update error responses.
The overall security approach is sound — centralized sanitization, hash-based CSP, and consistent field-level validators across 6 of 7 serializers. However, BaseAdapterSerializer uses object-level validate() which surfaces field validation errors under non_field_errors instead of per-field, breaking the consistent API error contract. The missing modern DOM events in _DOM_EVENTS and the truncated hash-regeneration comment are lower-severity but reduce long-term maintainability and defence-in-depth coverage. No regressions in prompt/preamble fields (correctly excluded). Test coverage is thorough for the sanitizer module itself.
Pay close attention to backend/adapter_processor_v2/serializers.py (object-level validate producing non_field_errors) and backend/utils/input_sanitizer.py (incomplete _DOM_EVENTS list and truncated hash comment in the middleware).

Important Files Changed

Filename	Overview
backend/utils/input_sanitizer.py	New centralized input sanitization module with HTML tag, JS protocol, and event handler pattern matching. Patterns are well-designed with good false-positive mitigations, but the `_DOM_EVENTS` list is missing several modern events (animationend, transitionend, message, etc.) that are commonly used in XSS bypass payloads. Primary HTML tag protection remains intact.
backend/adapter_processor_v2/serializers.py	Input validation added via object-level `validate()` instead of field-level `validate_<field>()` methods. This causes validation errors for `adapter_name` and `description` to surface as `non_field_errors` in the API response rather than being tied to the specific field, inconsistent with all other serializers in this PR.
backend/middleware/content_security_policy.py	New CSP middleware with restrictive policy for the Django API backend. Uses SHA-256 hash instead of `unsafe-inline` for script-src, and `setdefault()` to avoid overwriting view-level CSP. The hash-regeneration comment is truncated (`...`), which would prevent future maintainers from updating it safely.
frontend/nginx.conf	CSP header added for the React SPA with explicit allowlists for Monaco Editor, PDF.js, PostHog, GTM, reCAPTCHA, Stripe, and Product Fruits. `unsafe-inline` remains in `script-src` (documented as needed for container-injected `runtime-config.js`) and `wss:` is a scheme-only WebSocket wildcard. Previous issues around broken multi-string syntax and overly broad `img-src` have been addressed.
backend/utils/tests/test_input_sanitizer.py	Comprehensive 22-case test suite covering clean input, HTML tags (open and unclosed), JS/vbscript/data URI protocols, event handlers, whitespace stripping, and false-positive allowances. Missing a test case for MIME-type-containing prose (e.g., "Accepts data: application/json") that the current pattern would falsely reject.
backend/workflow_manager/workflow_v2/serializers.py	Field-level validators added for `workflow_name` and `description`, with correct `None` guard on the nullable `description` field. Pattern is consistent with other serializers in this PR.
backend/prompt_studio/prompt_studio_core_v2/serializers.py	Field-level validators added for `tool_name` and `description` with correct `None` guard. Prompt content fields (`prompt`, `preamble`, `postamble`, `summarize_prompt`) are intentionally excluded from validation.
backend/backend/settings/base.py	CSP middleware added to the base `MIDDLEWARE` list, applying it universally across all environments. This is intentionally more restrictive than production-only and is consistent with the approach of catching CSP regressions early.
backend/notification_v2/serializers.py	Input sanitization correctly integrated into the existing `validate_name` method, with the sanitized (stripped) value used for subsequent uniqueness checks against the database.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[API Request with user input] --> B[DRF Serializer Field Validation]
    B --> C{Field-level\nvalidate_field?}
    C -- "Most serializers\n(WorkflowSerializer,\nAPIDeploymentSerializer, etc.)" --> D["validate_name_field()\nor validate_no_html_tags()"]
    C -- "BaseAdapterSerializer\n(object-level validate)" --> E[validate method\ndata.get adapter_name/description]
    E --> D
    D --> F{HTML_TAG_PATTERN\nmatch?}
    F -- Yes --> G[ValidationError: HTML/script tags]
    F -- No --> H{JS_PROTOCOL_PATTERN\nmatch?}
    H -- "javascript:|vbscript:\ndata:mime/type" --> I[ValidationError: dangerous URI]
    H -- No --> J{EVENT_HANDLER_PATTERN\nmatch?}
    J -- "onXXX= known\nDOM event" --> K[ValidationError: event handler]
    J -- No --> L[Validated value returned\nto DRF]
    L --> M[Django View / DB save]

    subgraph CSP ["CSP Headers (after response)"]
        N[ContentSecurityPolicyMiddleware\nprocess_response] --> O{Header already set?}
        O -- No --> P["Add CSP: default-src 'self'\nscript-src SHA256 hash\nobject-src 'none'..."]
        O -- Yes --> Q[Skip setdefault]
        R[nginx add_header] --> S[Frontend SPA CSP\nunsafe-inline in script-src\nThird-party allowlists]
    end

Prompt To Fix All With AI

This is a comment left during a code review.
Path: backend/adapter_processor_v2/serializers.py
Line: 31-44

Comment:
**Object-level `validate()` produces `non_field_errors` instead of field-specific errors**

`validate_name_field` and `validate_no_html_tags` each raise `ValidationError` with a plain string. When a string `ValidationError` is raised inside the object-level `validate()` method, DRF surfaces it under `non_field_errors` rather than tying it to `adapter_name` or `description`. Every other serializer added in this PR uses field-level `validate_<field>` methods that produce field-specific errors. A caller submitting a malicious `adapter_name` would receive:

```json
{"non_field_errors": ["Adapter name must not contain HTML or script tags."]}
```

instead of the expected:

```json
{"adapter_name": ["Adapter name must not contain HTML or script tags."]}
```

The fix is to use field-level validators consistently, or explicitly raise with a dict keyed on the field name:

```python
def validate_adapter_name(self, value: str) -> str:
    return validate_name_field(value, field_name="Adapter name")

def validate_description(self, value: str) -> str:
    if value is None:
        return value
    return validate_no_html_tags(value, field_name="Description")
```

Note that field-level validators cannot be added to `BaseAdapterSerializer` via `validate_<field>` if subclasses rely on `validate()` for cross-field logic — in that case, raise with a dict: `raise ValidationError({"adapter_name": detail})`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: backend/utils/input_sanitizer.py
Line: 17-26

Comment:
**`_DOM_EVENTS` list is missing several modern and commonly-exploited event names**

The curated list provides good coverage of traditional DOM events, but several modern events that are exploitable in XSS payloads — notably ones that fire **without user interaction** — are absent:

- `animationend`, `animationstart`, `animationiteration` — triggered automatically when a CSS animation runs (no click needed)
- `transitionend` — same, triggered by CSS transitions
- `gotpointercapture`, `lostpointercapture` — missing from the pointer family
- `message`, `messageerror` — used in cross-origin attacks via `postMessage`
- `hashchange`, `popstate`, `storage` — navigation/state events

`onanimationend` in particular is a common WAF-bypass payload (e.g., `<div style="animation: x 1s" onanimationend=alert(1)>`). The primary `HTML_TAG_PATTERN` check still catches the wrapping tag, so this is a defense-in-depth gap rather than a standalone bypass. Still, since the pattern is explicitly tested and documented as a defence-in-depth control, completing the event list maintains the stated coverage guarantee.

Consider appending the missing events to `_DOM_EVENTS`:
```python
_DOM_EVENTS = (
    ...
    "animationend|animationstart|animationiteration|"
    "transitionend|transitioncancel|transitionrun|transitionstart|"
    "gotpointercapture|lostpointercapture|"
    "message|messageerror|"
    "hashchange|popstate|storage|"
    "visibilitychange|fullscreenchange|"
    "beforeinput|formdata|"
    "online|offline"
)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: backend/middleware/content_security_policy.py
Line: 17-19

Comment:
**Truncated hash-regeneration command prevents future maintainability**

The comment `python -c "import hashlib,base64; ..."` is incomplete — the `...` body has been omitted. When `login.html`'s inline script is changed in the future, any developer following this comment will have no way to regenerate the hash and will likely fall back to `'unsafe-inline'`, undoing the security benefit of the hash-based approach.

The full runnable command should be included here:

```python
# python3 -c "
# import hashlib, base64
# script = b'''<exact bytes of the <script>...</script> content here>'''
# print('sha256-' + base64.b64encode(hashlib.sha256(script).digest()).decode())
# "
```

Alternatively, a shell one-liner against the actual file is more robust (whitespace-sensitive):
```bash
# openssl dgst -sha256 -binary <(sed -n '/<script>/,/<\/script>/p' backend/templates/login.html | tail -n +2 | head -n -1) | base64
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (7): Last reviewed commit: "Merge branch 'main' into security/input-..." | Re-trigger Greptile}

backend/workflow_manager/workflow_v2/serializers.py

backend/prompt_studio/prompt_studio_core_v2/serializers.py

frontend/nginx.conf

backend/middleware/content_security_policy.py

backend/utils/input_sanitizer.py

- Narrow EVENT_HANDLER_PATTERN to vetted DOM event names to avoid false positives on benign words like "connection=", "onboarding=", "oncall=" - Add None guards to validate_description in WorkflowSerializer and CustomToolSerializer to prevent TypeError on nullable fields - Add 'unsafe-inline' to script-src and style-src in backend CSP middleware for login.html inline scripts/styles - Add 'object-src none' to backend CSP per OWASP best practice - Remove plaintext ws: from nginx connect-src, keep only wss: - Add 6 regression tests for benign input and non-whitespace delimiters - Update CSP middleware docstring to accurately describe behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

frontend/nginx.conf

backend/middleware/content_security_policy.py

coderabbitai

🧹 Nitpick comments (2)

backend/api_v2/serializers.py (1)

66-72: Type hint should reflect nullable input; None guard looks good.

The None guard in validate_description correctly addresses the previous review concern about potential TypeError. However, similar to the other serializer, the type annotation value: str doesn't accurately reflect that None is accepted and returned.

✏️ Suggested type hint fix

     def validate_display_name(self, value: str) -> str:
         return validate_name_field(value, field_name="Display name")

-    def validate_description(self, value: str) -> str:
+    def validate_description(self, value: str | None) -> str | None:
         if value is None:
             return value
         return validate_no_html_tags(value, field_name="Description")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/api_v2/serializers.py` around lines 66 - 72, The validate_description
function accepts and returns None but is currently annotated as value: str;
update its type hints to reflect nullability by using Optional[str] for the
parameter and return type (e.g., def validate_description(self, value:
Optional[str]) -> Optional[str]), and ensure typing.Optional is imported; keep
the existing None guard and the call to validate_no_html_tags(value,
field_name="Description") unchanged.

backend/workflow_manager/workflow_v2/serializers.py (1)

50-56: Type hint should reflect nullable input.

The validate_description method correctly guards against None, but the type annotation value: str doesn't reflect that the method accepts and returns None. This causes a mismatch between the declared type and actual behavior.

✏️ Suggested type hint fix

     def validate_workflow_name(self, value: str) -> str:
         return validate_name_field(value, field_name="Workflow name")

-    def validate_description(self, value: str) -> str:
+    def validate_description(self, value: str | None) -> str | None:
         if value is None:
             return value
         return validate_no_html_tags(value, field_name="Description")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/workflow_v2/serializers.py` around lines 50 - 56,
The validate_description method accepts and returns None but its signature uses
value: str; update the type hints to reflect nullable input/output by changing
the parameter and return type to Optional[str] in validate_description and
importing Optional from typing if not already present (leave
validate_workflow_name as-is); ensure the function signature reads
validate_description(self, value: Optional[str]) -> Optional[str] and retains
the existing None guard and call to validate_no_html_tags.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/api_v2/serializers.py`:
- Around line 66-72: The validate_description function accepts and returns None
but is currently annotated as value: str; update its type hints to reflect
nullability by using Optional[str] for the parameter and return type (e.g., def
validate_description(self, value: Optional[str]) -> Optional[str]), and ensure
typing.Optional is imported; keep the existing None guard and the call to
validate_no_html_tags(value, field_name="Description") unchanged.

In `@backend/workflow_manager/workflow_v2/serializers.py`:
- Around line 50-56: The validate_description method accepts and returns None
but its signature uses value: str; update the type hints to reflect nullable
input/output by changing the parameter and return type to Optional[str] in
validate_description and importing Optional from typing if not already present
(leave validate_workflow_name as-is); ensure the function signature reads
validate_description(self, value: Optional[str]) -> Optional[str] and retains
the existing None guard and call to validate_no_html_tags.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eff2e393-910c-46aa-825f-11ec61f567cc

📥 Commits

Reviewing files that changed from the base of the PR and between d595efd and cb8bb89.

📒 Files selected for processing (5)

backend/adapter_processor_v2/serializers.py
backend/api_v2/serializers.py
backend/backend/settings/base.py
backend/prompt_studio/prompt_studio_core_v2/serializers.py
backend/workflow_manager/workflow_v2/serializers.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/adapter_processor_v2/serializers.py
backend/prompt_studio/prompt_studio_core_v2/serializers.py

- Fix broken nginx add_header syntax: combine CSP into a single quoted string (nginx does not concatenate adjacent quoted strings) - Tighten img-src from broad https: wildcard to explicit PostHog origins - Replace 'unsafe-inline' with SHA-256 hash for login.html inline script in backend CSP middleware, maintaining strong XSS protection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

backend/middleware/content_security_policy.py (1)

17-20: Complete the hash regeneration command in the comment.

The regeneration command is truncated with ..., making it harder to regenerate the hash when the login script changes.

📝 Suggested fix

     # SHA-256 hash of the inline script in login.html (form submit spinner).
     # If that script changes, regenerate with:
-    #   python -c "import hashlib,base64; ..."
+    #   echo -n '<script content>' | openssl dgst -sha256 -binary | openssl base64
+    # Or: python -c "import hashlib,base64,sys; print('sha256-'+base64.b64encode(hashlib.sha256(sys.stdin.read().encode()).digest()).decode())"
     _SCRIPT_HASH = "sha256-GES82NvXpRYmVFDKv6vRHx2c7xuv8mgUzUaP7heKeFY="

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/middleware/content_security_policy.py` around lines 17 - 20, The
comment for _SCRIPT_HASH is missing the full regeneration command; update the
comment to show a complete one-liner that reads the exact inline script bytes
and prints the SHA-256 BASE64-prefixed value, e.g. use a Python command like:
python -c "import
hashlib,base64,sys;print('sha256-'+base64.b64encode(hashlib.sha256(sys.stdin.buffer.read()).digest()).decode())"
< script-file and then replace the value of _SCRIPT_HASH with the printed
string; ensure you run the command against the exact login inline script (no
extra whitespace) so the new hash matches the content.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/middleware/content_security_policy.py`:
- Around line 17-20: The comment for _SCRIPT_HASH is missing the full
regeneration command; update the comment to show a complete one-liner that reads
the exact inline script bytes and prints the SHA-256 BASE64-prefixed value, e.g.
use a Python command like: python -c "import
hashlib,base64,sys;print('sha256-'+base64.b64encode(hashlib.sha256(sys.stdin.buffer.read()).digest()).decode())"
< script-file and then replace the value of _SCRIPT_HASH with the printed
string; ensure you run the command against the exact login inline script (no
extra whitespace) so the new hash matches the content.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 597c9c59-7665-499e-9975-20684d7162d5

📥 Commits

Reviewing files that changed from the base of the PR and between cb8bb89 and 8b648d2.

📒 Files selected for processing (2)

backend/middleware/content_security_policy.py
frontend/nginx.conf

backend/utils/input_sanitizer.py

backend/backend/settings/base.py

- Extend HTML_TAG_PATTERN to catch unclosed tags like "<script" that could be completed by adjacent content in non-React contexts - Extend JS_PROTOCOL_PATTERN to also block data: and vbscript: URIs which can execute scripts when rendered into href/src attributes - Add tests for unclosed tags, data: URI, vbscript:, and "a < 3" benign case Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

backend/utils/input_sanitizer.py

- Refine JS_PROTOCOL_PATTERN to only match data: when followed by a MIME type (word/word), avoiding false positives on text like "Input data: JSON format" - Add tests for benign "data:" in prose and data: URI with MIME type Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

backend/middleware/content_security_policy.py

frontend/nginx.conf

backend/utils/input_sanitizer.py

backend/middleware/content_security_policy.py

github-actions · 2026-03-24T16:55:03Z

Frontend Lint Report (Biome)

✅ All checks passed! No linting or formatting issues found.

github-actions · 2026-03-24T16:56:03Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 98 passed, 0 failed (98 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

sonarqubecloud · 2026-03-24T16:56:08Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

backend/utils/input_sanitizer.py

backend/middleware/content_security_policy.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

35f6eaf

for more information, see https://pre-commit.ci

hari-kuriakose requested review from jaseemjaskp, muhammad-ali-e, ritwik-g and vishnuszipstack March 6, 2026 11:20

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

backend/adapter_processor_v2/serializers.py Show resolved Hide resolved

backend/api_v2/serializers.py Show resolved Hide resolved

backend/middleware/content_security_policy.py Outdated Show resolved Hide resolved

backend/utils/input_sanitizer.py Outdated Show resolved Hide resolved

hari-kuriakose and others added 4 commits March 6, 2026 20:38

Apply suggestion from @coderabbitai[bot]

d9c5770

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>

Apply suggestion from @coderabbitai[bot]

1a46788

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

423c2e9

for more information, see https://pre-commit.ci

Apply suggestions from code review

9a62502

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>

hari-kuriakose requested review from Deepak-Kesavan and jaags-dev March 6, 2026 15:14

hari-kuriakose self-assigned this Mar 6, 2026

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

backend/middleware/content_security_policy.py Outdated Show resolved Hide resolved

hari-kuriakose changed the title ~~[SECURITY] Add input validation, CSP headers, and secure cookie defaults~~ [SECURITY] Add input validation and CSP headers Mar 6, 2026

ritwik-g reviewed Mar 18, 2026

View reviewed changes

frontend/nginx.conf Outdated Show resolved Hide resolved

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

muhammad-ali-e approved these changes Mar 18, 2026

View reviewed changes

vishnuszipstack and others added 2 commits March 24, 2026 11:11

Merge branch 'main' into security/input-validation-csp-cookie-hardening

cb8bb89

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

frontend/nginx.conf Outdated Show resolved Hide resolved

frontend/nginx.conf Outdated Show resolved Hide resolved

backend/middleware/content_security_policy.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

vishnuszipstack requested a review from ritwik-g March 24, 2026 06:17

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

backend/utils/input_sanitizer.py Outdated Show resolved Hide resolved

backend/utils/input_sanitizer.py Outdated Show resolved Hide resolved

backend/backend/settings/base.py Show resolved Hide resolved

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

backend/utils/input_sanitizer.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

backend/middleware/content_security_policy.py Show resolved Hide resolved

frontend/nginx.conf Show resolved Hide resolved

vishnuszipstack approved these changes Mar 24, 2026

View reviewed changes

Merge branch 'main' into security/input-validation-csp-cookie-hardening

1b88282

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

backend/utils/input_sanitizer.py Show resolved Hide resolved

backend/middleware/content_security_policy.py Show resolved Hide resolved

Merge branch 'main' into security/input-validation-csp-cookie-hardening

4107b53

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

backend/utils/input_sanitizer.py Show resolved Hide resolved

backend/middleware/content_security_policy.py Show resolved Hide resolved

kirtimanmishrazipstack merged commit bf15e8d into main Mar 24, 2026
9 checks passed

kirtimanmishrazipstack deleted the security/input-validation-csp-cookie-hardening branch March 24, 2026 17:03

vishnuszipstack mentioned this pull request Mar 25, 2026

[FIX] Frontend CSP: add unsafe-eval for RJSF and blob: for PDF viewer #1875

Merged

Conversation

hari-kuriakose commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Relevant Docs

Related Issues or PRs

Dependencies Versions

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 24, 2026

Frontend Lint Report (Biome)

Uh oh!

github-actions bot commented Mar 24, 2026

Test Results

Uh oh!

sonarqubecloud bot commented Mar 24, 2026

Quality Gate passed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

hari-kuriakose commented Mar 6, 2026 •

edited

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

greptile-apps bot commented Mar 18, 2026 •

edited

Loading