Skip to content

Proof of concept: Remove the requirement to check out the gutenberg repository and running the full build script#11019

Draft
desrosj wants to merge 64 commits intoWordPress:trunkfrom
desrosj:try/remove-gutenberg-git-checkout-and-build
Draft

Proof of concept: Remove the requirement to check out the gutenberg repository and running the full build script#11019
desrosj wants to merge 64 commits intoWordPress:trunkfrom
desrosj:try/remove-gutenberg-git-checkout-and-build

Conversation

@desrosj
Copy link
Member

@desrosj desrosj commented Feb 23, 2026

This is a work in progress.

This PR attempts to remove the requirement to checkout the WordPress/gutenberg Git repository locally to run the gutenberg repository build script as a part of the wordpress-develop` one.

Instead a copy of the built plugin is downloaded from the GitHub Container Registry for the respective pinned hash value (see WordPress/gutenberg#75844 for that part of this approach). This is now the default behavior.

The ability to checkout and build from the gutenberg plugin has been maintained. This is helpful in case a contributor wants to test changes to that repository (perhaps the wp-build package) locally before pushing/creating a PR. This can mode can be activated by setting GUTENBERG_LOCAL_REPO=true locally in the .env file.

Other changes in this PR:

  • Usage of ref to sha to avoid confusion. Ref is typically a human-readable branch path. A commit hash or SHA value is the full length, unique hash for a commit.
  • Gutenberg-related grunt tasks have been renamed to follow gutenberg:command instead of gutenberg-command pattern.
  • gutenberg has been removed from all file names within tools/gutenberg since it's redundant.
  • A single job has been added to the PHPUnit and build process testing workflows to confirm that both ways of building work as expected.

Trac ticket: Core-64393.

Use of AI Tools

I used Claude Code pretty heavily throughout the work on this pull request.


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@desrosj desrosj self-assigned this Feb 23, 2026
With this new approach, a pre-built zip will be downloaded instead.
This is no longer necessary because a Git repository is no longer used.
This switches to downloading a prebuilt zip file uploaded to the GitHub Container Registry for a given commit instead.

This eliminates the need to run any Gutenberg build scripts within `wordpress-develop`.
A REF is typically a human-readable path to a branch where as a SHA is a hash representing an individual commit. Since the latter is what's being used here, this aims to avoid confusion.
@desrosj desrosj changed the title Proof of concept: Stop checking out gutenberg repository and running two build scripts Proof of concept: Remove the requirement to check out the gutenberg repository and running the full build script Feb 24, 2026
@desrosj
Copy link
Member Author

desrosj commented Feb 24, 2026

There is one remaining issue to figure out: how to map the icons package correctly from the built plugin.

I've removed the chunk of code responsible for mapping those files to their respective locations within wordpress-develop and the build process succeeds.

@desrosj
Copy link
Member Author

desrosj commented Feb 24, 2026

It seems that the icons are not currently included in the Gutenberg plugin, only in wordpress-develop.

@mcsf I see you introduced this in 5b7a3ad. Could you share a bit more context as to why these are not yet included in the Gutenberg plugin? I don't see them in the latest plugin version, but I may be missing them.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a step up @desrosj and should avoid a lot of the delay when checking out the repo. 🙇‍♂️

does the zip contain the proper built versions of the artifacts or does it contain source copies? that is, are there any build steps that are occurring on the Gutenberg side that occur before that ZIP is available?

it would definitely be helpful to know in the docs, at least in download-gutenberg.js because I think it’s really not clear if we are downloading the Gutenberg source or some output. if output, I would love to know on the wordpress-develop side what our expectations are of that artifact and why .layers[0].digest is special.

Perhaps there is a command we could include which comparably would generate the right files were we given the Gutenberg source.


is it right that this still leaves the icon disparity in place?

fs.readFileSync( packageJsonPath, 'utf8' )
);
sha = packageJson.gutenberg?.sha;
ghcrRepo = packageJson.gutenberg?.ghcrRepo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we short on bytes? the gh part is obvious to me as a reader, but I cannot figure out what cr means, and since this only appears to be documented inside this file, I’m not sure it would be clear from the package.json file, especially since it looks like a standard GitHub repo name.

perhaps something githubRepo would suffice? it doesn’t seem like the package.json value has anything specific to do with the Github Container Registry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am new to this as well, so I found that GHCR stands for the GitHub Container Registry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a note within the inlind docblock to explain this at the first occurrence of ghcrRepo. Does that address your concerns here @dmsnell?

I also changed the name of the package to gutenberg-wp-develop-build, making the full reference WordPress/gutenberg/gutenberg-wp-develop-build. It's more specific to the purpose, and hopefully won't lead to any confusion.

Within the GHCR, packages are published to a specific repository. While they are listed on a separate screen, you need to use the Org/Repository/package path to reference them.

const installedHash = fs.readFileSync( hashFilePath, 'utf8' ).trim();
if ( installedHash !== sha ) {
throw new Error(
`SHA mismatch: expected ${ sha } but found ${ installedHash }. Run \`npm run grunt gutenberg-download -- --force\` to download the correct version.`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at first I was going to ask about making this check before downloading, but I see it exists after downloading.

this is probably better because it informs someone that their files are out of date without eagerly downloading a new copy and replacing the old ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I also want to create a file in the root of the repository to store a hash of the entire gutenberg directory immediately after unzipping. Then the user can also be made aware of instances where the files in the gutenberg directory were modified, which may be unexpected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can backfire if we want to hash the entire directory. that can be an I/O-intensive task and either requires polling or reliable file watching.

with the solution already here: notify when expired, it gives agency to the user to download or ignore

const packageJsonPath = path.join( rootDir, 'package.json' );

/**
* Execute a command, streaming stdio directly so progress is visible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really streaming? It seems the stdout variable is populated with streaming, but the promise only resolves with the captured stdout when the child process is closed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, how does that look?

@mcsf
Copy link
Contributor

mcsf commented Feb 24, 2026

It seems that the icons are not currently included in the Gutenberg plugin, only in wordpress-develop.

@mcsf I see you introduced this in 5b7a3ad. Could you share a bit more context as to why these are not yet included in the Gutenberg plugin? I don't see them in the latest plugin version, but I may be missing them.

Oof, I'm so glad you pinged me! Fix at WordPress/gutenberg#75866

desrosj and others added 4 commits February 24, 2026 09:11
The Gutenberg repository has been updated to include the icon assets in the plugin build. See github.com/WordPress/gutenberg/pull/75866.
- Add missing hard stop at the end of inline comment.
- Make use if `fetch()` instead of `exec( 'curl' )`.

Co-authored-by: Weston Ruter <westonruter@gmail.com>
@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

throw new Error( `Failed to download blob: ${ response.status } ${ response.statusText }` );
}
const buffer = await response.arrayBuffer();
fs.writeFileSync( zipPath, Buffer.from( buffer ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point it’s probably possible to remove all of the _Sync versions and stream the output directly into the file.

not essential, but it does feel like we’re possibly going out of our way to do more work to avoid a simpler default.

await fs.writeFile( zipPath, response.body );

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not deeply knowledgeable here, but Claude told me that this wouldn't work as you've written "since fs.promises.writeFile doesn't accept a ReadableStream — stream.pipeline is the correct API for this." Does this seem right now?

} else {
await exec( 'git', [ 'checkout', 'FETCH_HEAD' ], {
cwd: gutenbergDir,
} );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can fail if there are uncommitted changes. what’s the primary goal here? is it to update the code with recent changes?

should it preserve local changes that can be preserved?

could we simply abort and mention if there are uncommitted changes? let them figure it out rather than mess with their config if they don’t expect this to happen.

const dirHashFile = path.join( rootDir, '.gutenberg-dir-hash' );
if ( fs.existsSync( dirHashFile ) ) {
fs.rmSync( dirHashFile );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check introduces a race condition that doesn’t need to exist.

fs.rmSync( dirHashFile, { force: true } );

like most FS calls, performing if checks before doing them (rather than catching errors afterward) opens up a range of bugs and exploits.

https://nodejs.org/api/fs.html#fsrmsyncpath-options

* Directories inside gutenberg/ to exclude when hashing.
* These are build artefacts or dependency caches, not source files.
*/
const DIR_HASH_EXCLUDE = new Set( [ 'build', 'node_modules', '.git' ] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about build-module, build-style, and build-types, etc…

would it make more sense to read from the .gitignore and apply those rules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention here was to notify the user of any potentially unexpected discrepancy in the gutenberg directory vs. when it was created fresh from the pinned hash.

Essentially, any time there's a change to any source or built file, it could be an indication that there will be unexpected results when building. Looking at this again now with a fresh brain:

  • build should likely not be excluded (editing these files directly is incorrect and should be flagged)
  • phpunit, tests, etc. should also be excluded.
  • docs, platform-docs, etc.

Maintaining this list will be a pain, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ultimately we only truly care about the files which get copied into Core, right? would we benefit from reusing that single file mapping from the copy step?

checkRepoHash();

// Warn if the source has changed since the last build.
checkDirHash();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is the existing style, then whatever is fine, but these functions without return values are surprising and opaque. they need for the explanatory comments comes from the fact that their behavior is hidden inside them, rather than from being explicit in how we’re handling the responses.

note: checkRepoHash() appears to just do nothing if the directory is entirely missing, but I would have expected from the comment here that some warning would have been raised.

}
function collectFiles( dir, excludeDirs = new Set() ) {
const files = [];
for ( const entry of fs.readdirSync( dir, { withFileTypes: true } ).sort( ( a, b ) => a.name.localeCompare( b.name ) ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of sorting the paths?

Copy link
Member Author

@desrosj desrosj Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a Claude suggestion.

The sorting ensures the hash is deterministic — the same directory contents always produce the same digest regardless of the order the filesystem returns entries.

I have not validated this yet, but it does make sense to me.

* This stores the hash of the Gutenberg directory in a `.gutenberg-hash` file
* to track when changes have been made to those files locally.
* Symlinks are skipped because they can point to directories and would throw
* EISDIR when read as regular files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we didn’t want this warning here we could resolve symlinks with fs.readLinkSync() and then treat them like normal paths.

there are obviously valid reasons not to follow symbolic links, but confusion about whether we are reading the link itself vs. the thing it points to is not one I’m used to seeing.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@desrosj please don’t let my comments discourage you; this is really good stuff.

did you by any chance try to measure the time it takes to run the dir hashing? I would think that could have some unexpected lag to it.

@desrosj
Copy link
Member Author

desrosj commented Mar 3, 2026

did you by any chance try to measure the time it takes to run the dir hashing? I would think that could have some unexpected lag to it.

My initial rough measurements were showing that the hashing takes around 150-200ms with the downloaded pre-built plugin files, and ~2.5 seconds with the full repository checkout.

response.body,
zlib.createGunzip(),
Writable.toWeb( tar.stdin ),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah darn, sorry if I leaned too hard into this and we still have to call out to the shell.

}
} );
child.on( 'error', reject );
} );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also spawnSync() if you don’t want all the setup and error code.

function runScript( scriptPath, args = [] ) {
	const response = spawnSync( process.execPath, [ scriptPath, ...args ], {
		cwd: rootDir,
		stdio: 'inherit',
	} );

	if ( 0 === response.status ) {
		return;
	}

	throw response.error;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants