pool: add firefly onStart marker for MoverProtocol-based transfers#8055
Open
ShawnMcKee wants to merge 1 commit intodCache:masterfrom
Open
pool: add firefly onStart marker for MoverProtocol-based transfers#8055ShawnMcKee wants to merge 1 commit intodCache:masterfrom
ShawnMcKee wants to merge 1 commit intodCache:masterfrom
Conversation
kofemann
requested changes
Mar 30, 2026
Member
kofemann
left a comment
There was a problem hiding this comment.
I think that suggested change is wrong. It might preserve the pool IP, but (a) uses the wrong port and (b) will work incorrectly on multi-home hosts.
I would suggest updating the RemoteHttpTransferService and passing the TransferLifeCycle to the mover when it is created by the createMoverProtocol method. The RemoteHttpDataTransferProtocol#doGet then should call onStart as soon as the local endpoint is defined.
Move the firefly flow-start marker emission from AbstractMoverProtocolTransferService.MoverTask into RemoteHttpDataTransferProtocol, where the actual HTTP connection's local socket address (correct IP and port) is available. Previously, the start marker was emitted in MoverTask.run() before the HTTP connection was established, using NetworkUtils.getLocalAddress() to derive the local endpoint. This produced the wrong port (0) and could select the wrong interface on multi-homed hosts. Now, RemoteHttpTransferService passes the TransferLifeCycle to RemoteHttpDataTransferProtocol at construction time and sets the Subject via the overridden createMover(). The protocol calls onStart() in doGet() and sendFile() immediately after capturing the local endpoint from HttpInetConnection, which provides the real bound address and port. Signed-off-by: Shawn McKee <smckee@umich.edu>
21f3ab9 to
873c491
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
AbstractMoverProtocolTransferService, the base class forMoverProtocol-based transfer services (used byRemoteHttpTransferServicefor FTS third-party copy movers), never callstransferLifeCycle.onStart(). This means firefly flow-start UDP markers are never emitted for TPC movers, even though:DefaultPostTransferServicecorrectly sendsonEnd()markers (using thederiveLocalEndpoint()fallback added in 11.2.3)NettyTransferService(direct client connections via xrootd/HTTP) correctly callsonStart()in itsexecuteMover()methodImpact
ESnet Stardust dashboards show incomplete flow information for TPC transfers. Specifically:
onEndmarkers are present with correct experiment/activity IDs from transfer tags (e.g., activityId=11, 15, 16 for ATLAS categories)onStartmarkers are emitted from pool nodes for TPC transfersonStartmarkers (fromNettyTransferService) are present but use FQAN fallback (activityId=1) since worker nodes don't send SciTag headersEvidence from ATLAS AGLT2 pool logs:
Fix
Add a
startTransferLifeCycle()call inMoverTask.run()(before I/O begins) that:_transferLifeCycleis configured and protocol is IP-basedNetworkUtils.getLocalAddress()— the same approach used byDefaultPostTransferService.deriveLocalEndpoint()foronEndtransferLifeCycle.onStart()with the remote endpoint, derived local endpoint, protocol info, and subjectSocketExceptiongracefully with debug loggingThis mirrors the pattern already established in
NettyTransferService.executeMover()for direct connections, extending it to allMoverProtocol-based transfers.Testing
mvn -pl modules/dcache -am compile -DskipTests)NettyTransferServiceandDefaultPostTransferServiceSigned-off-by: Shawn McKee smckee@umich.edu