Skip to content

Expose include_orig_elements param in partition API#576

Open
tylorbayer wants to merge 2 commits into
Unstructured-IO:mainfrom
SchoolAI:feat/include-orig-elements-param
Open

Expose include_orig_elements param in partition API#576
tylorbayer wants to merge 2 commits into
Unstructured-IO:mainfrom
SchoolAI:feat/include-orig-elements-param

Conversation

@tylorbayer

@tylorbayer tylorbayer commented Jun 22, 2026

Copy link
Copy Markdown

Summary

  • Adds include_orig_elements as a new form parameter (default True) to control whether original elements are included in chunk metadata
  • When True, elements used to form each chunk are attached to that chunk's .metadata.orig_elements as a gzipped+base64 blob
  • When False, these blobs are omitted from the response, significantly reducing payload size for large documents (especially those with large tables where the blob is duplicated into every chunk)
  • Parameter is wired through GeneralFormParams, pipeline_api, and all relevant chunking call sites

Changelog

Added a 0.1.8 entry under ### Features in CHANGELOG.md.

Test plan

  • Submit a partition request with include_orig_elements=false and verify response chunks do not contain orig_elements in metadata
  • Submit the same request with include_orig_elements=true (or omit the parameter) and confirm orig_elements is present in chunk metadata as expected
  • CI lint and test jobs pass

Made with Cursor

Review in cubic

Add include_orig_elements as a form parameter (default True) so callers
can omit orig_elements blobs from chunk metadata and reduce response size
for large documents with tables.

Co-authored-by: Cursor <cursoragent@cursor.com>
@tylorbayer

Copy link
Copy Markdown
Author

Closing — resubmitting from branch on upstream repo.

@tylorbayer tylorbayer closed this Jun 22, 2026
@tylorbayer tylorbayer reopened this Jun 22, 2026

@awalker4 awalker4 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

Co-authored-by: Cursor <cursoragent@cursor.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Shadow auto-approve: would auto-approve. Adds include_orig_elements parameter to partition API, defaulting to True, to optionally omit orig_elements blobs from chunk metadata (reducing payload size). Includes version bump and CHANGELOG update. Backward-compatible, low-risk additive change.

Re-trigger cubic

@tylorbayer

tylorbayer commented Jun 24, 2026

Copy link
Copy Markdown
Author

@awalker4 what is the process for getting this merged in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants