Skip to content

Storage - Content Validation Decoder Implementation#47016

Open
gunjansingh-msft wants to merge 30 commits intofeature/storage/content-validationfrom
feature/storage/pipelinepolicy
Open

Storage - Content Validation Decoder Implementation#47016
gunjansingh-msft wants to merge 30 commits intofeature/storage/content-validationfrom
feature/storage/pipelinepolicy

Conversation

@gunjansingh-msft
Copy link
Copy Markdown
Member

@gunjansingh-msft gunjansingh-msft commented Oct 15, 2025

No description provided.

@github-actions github-actions Bot added the Storage Storage Service (Queues, Blobs, Files) label Oct 15, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Oct 15, 2025

API Change Check

APIView identified API level changes in this PR and created the following API reviews

com.azure:azure-storage-blob
com.azure:azure-storage-common

Copy link
Copy Markdown
Member

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good! It's great to see us getting a working end to implementation of downloads. I just left some higher-level comments as we keeping hashing out the download implementation.

@gunjansingh-msft gunjansingh-msft requested a review from a team as a code owner December 3, 2025 15:54
Copy link
Copy Markdown
Member

@ibrandes ibrandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will be 100% honest - i was really struggling to walk through the logic in here, there's so much to look at x) my comments should highlight some of the more confusing areas. don't worry about polishing anything too much, but adding more subfunctions, centralizing util logic, and reducing the nesting should make this a lot easier to follow.

overall, i think you're going down the right path with how you utilize and set the decoder state. also, definitely look into downloadToFileImpl as mentioned in one of my comments - ProgressListener and ProgressReporter might be able to help us reduce some of the boilerplate code you've had to write.

@github-actions
Copy link
Copy Markdown
Contributor

Hi @gunjansingh-msft. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

@github-actions github-actions Bot added no-recent-activity There has been no recent activity on this issue. and removed no-recent-activity There has been no recent activity on this issue. labels Mar 13, 2026
Copy link
Copy Markdown
Member

@ibrandes ibrandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great so far!

Copy link
Copy Markdown
Member

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good. I like the overall structure of where the change is going. I have not dived too deep into the PR yet but just wanted to post some of the questions/comments that I had so far as I work through understanding the changes.

@gunjansingh-msft
Copy link
Copy Markdown
Member Author

looking great so far!

All the latest comments are addressed

…pipelinepolicy

Resolve conflicts in BuilderHelper (both decoder and encoder content validation policies) and StorageCrc64Calculator Javadoc.

Align download path with ContentValidationAlgorithm rename; fix downloadStreamWithResponseInternal variable mix-up.
Delegate CRC64/AUTO detection to ContentValidationModeResolver.isCrc64OrAuto from DownloadValidationUtils.
Update decoder tests for setContentValidationAlgorithm; remove MD5 download test (enum no longer includes MD5).
Fix incorrect @link in ContentValidationModeResolver Javadoc.

Made-with: Cursor
Copy link
Copy Markdown
Member

@ibrandes ibrandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good so far! mostly just had a couple comments on util cleanup and code consolidation - your decoder lifecycle handling looks good to me

Copy link
Copy Markdown
Member

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good. Just left some more comments. I also agreed with all of the feedback that @ibrandes left too.

StructuredMessageDecoder decoder = new StructuredMessageDecoder(expectedLength);

Flux<ByteBuffer> decodedStream = decodeStream(httpResponse.getBody(), decoder);
return new DecodedResponse(httpResponse, decodedStream);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, we are decoding the structured body into a payload body, but we are still wrapping the response with the original Content-Length header. we should add an override to DecodedResponse for the getHeaders method where we remove the Content-Length header:

@Override
public HttpHeaders getHeaders() {
    HttpHeaders headers = new HttpHeaders(originalResponse.getHeaders());
    headers.remove(HttpHeaderName.CONTENT_LENGTH);
    return headers;
}

another option is to use the adjusted Content-Length value, but I'm not sure if we know that until everything is consumed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibrandes and @gunjansingh-msft I'm not sure if I'm following why we'd remove the Content-Length header all together? My main concern is that using the Content-Length header is a common way to derive the size of blob that is being downloaded and downstream logic might rely on it being present? Might be interesting to check with what .NET does here as well to make sure we are consistent. I do generally agree that it makes sense to override the Content-Length with the value of the x-ms-structured-content-length in order to not break existing flows that leverage content length as the blob size.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve aligned this with the .NET behavior in the latest revision.

In the .NET SDK, the response body is wrapped via StructuredMessageDecodingStream.WrapStream, but the response headers (including Content-Length) are not modified and continue to reflect the wire payload size.

I’ve removed the Content-Length override here as well, so the Java implementation now matches the .NET and by passing headers through unchanged.

Callers can still get the decoded length via x-ms-structured-content-length, BlobDownloadHeaders.getBlobSize(), or Content-Range if needed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding this as an ADO item to further investigate.

Copy link
Copy Markdown
Member

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Just leaving some more questions that I had as I'm working through fully understanding the implementation. Mainly focused on the policy and the decoded response class. Plan to dig more into the structured message decoder later.

StructuredMessageDecoder decoder = new StructuredMessageDecoder(expectedLength);

Flux<ByteBuffer> decodedStream = decodeStream(httpResponse.getBody(), decoder);
return new DecodedResponse(httpResponse, decodedStream);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibrandes and @gunjansingh-msft I'm not sure if I'm following why we'd remove the Content-Length header all together? My main concern is that using the Content-Length header is a common way to derive the size of blob that is being downloaded and downstream logic might rely on it being present? Might be interesting to check with what .NET does here as well to make sure we are consistent. I do generally agree that it makes sense to override the Content-Length with the value of the x-ms-structured-content-length in order to not break existing flows that leverage content length as the blob size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Storage Storage Service (Queues, Blobs, Files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants