Skip to content

feat(caching): ITenantCacheService structural guardrail for tenant cache isolation#1243

Open
cesarcastrocuba wants to merge 6 commits into
fullstackhero:developfrom
cesarcastrocuba:feat/caching-tenant-isolation
Open

feat(caching): ITenantCacheService structural guardrail for tenant cache isolation#1243
cesarcastrocuba wants to merge 6 commits into
fullstackhero:developfrom
cesarcastrocuba:feat/caching-tenant-isolation

Conversation

@cesarcastrocuba
Copy link
Copy Markdown
Contributor

Summary

Introduces ITenantCacheService, a structural guardrail that automatically scopes all HybridCache operations to the active tenant, eliminating cross-tenant data leak risks.

Problem

HybridCache is injected directly into business modules. Developers must manually prefix cache keys with CacheKeys.Tenant(tenantId). If forgotten, Tenant A data leaks to Tenant B.

Solution

New TenantHybridCache (registered as Scoped) wraps HybridCache and transparently prefixes all keys (t:{tenantId}:{key}) and tags. Business modules now inject ITenantCacheService instead.

Changes

BuildingBlocks/Caching

  • New ITenantCacheService interface
  • New TenantHybridCache implementation (internal, sealed, auto-scopes keys/tags)
  • Extensions.cs: registers ITenantCacheService as Scoped in AddHeroCaching()

Modules

  • UserPermissionService (Identity): migrated from HybridCache to ITenantCacheService
  • RolePermissionSyncer (Identity): same
  • TenantThemeService (Multitenancy): tenant entries use ITenantCacheService; cross-tenant DefaultTheme keeps direct HybridCache

Tests

  • 4 new unit tests: cross-tenant isolation, set/remove scoping, null context guard, empty tenant guard
  • 1 new registration test verifying Scoped lifetime

Verification

All 38 Caching.Tests pass. 0 warnings, 0 errors.

César Castro added 6 commits May 17, 2026 18:32
In local development (e.g. dotnet run or Aspire), the working directory
is the project folder where the linked appsettings.json doesn't exist natively.
This causes an OptionsValidationException for JwtOptions on startup, crashing
Aspire with exit code -532462766. Loading from AppContext.BaseDirectory
ensures the copied linked files are found.
…on in Aspire

When running DbMigrator via Aspire (dotnet run), the working directory is the project directory, so appsettings.json is not found. This causes JwtOptions to fail ValidateOnStart. We now load appsettings.json from AppContext.BaseDirectory, and re-apply environment variables so Aspire connection strings are not overwritten.
… tenant isolation

Adds TenantHybridCache, a scoped decorator of HybridCache that automatically prefixes every cache key and tag with the active tenant ID (t:{tenantId}:). This prevents cross-tenant cache collisions without requiring callers to manage prefixes manually.

Changes:
- BuildingBlocks/Caching: ITenantCacheService interface, TenantHybridCache implementation, registered as Scoped in Extensions.AddHeroCaching()
- Identity: UserPermissionService and RolePermissionSyncer migrated from HybridCache to ITenantCacheService
- Multitenancy: TenantThemeService migrated; DefaultTheme (cross-tenant by design) keeps direct HybridCache
- Tests: 4 new unit tests verifying tenant isolation, null context guards, and Scoped DI registration

Architecture rule: business module code must inject ITenantCacheService, not HybridCache directly.
…ce guardrail

4 architecture rules enforced via NetArchTest:
1. ModuleClasses_Should_Not_Depend_On_HybridCache_Directly: scans all module assemblies via ModuleAssemblyDiscovery and fails if any type (excluding the intentional TenantThemeService exception) has a direct dependency on Microsoft.Extensions.Caching.Hybrid.
2. ITenantCacheService_Should_Be_Public: verifies the interface is public and accessible from modules.
3. TenantHybridCache_Implementation_Should_Be_Internal: verifies the implementation cannot be newed up directly from outside the Caching assembly.
4. CachingArchitectureTests_Should_HaveAtLeastOneModuleToScan: guard to prevent vacuous pass with empty assembly list.
…I build failure

Fixes:
- CachingArchitectureTests.cs was missing 'using Xunit;' (CS0246 FactAttribute not found in CI)
- Refines architecture rule to check for the HybridCache class specifically (not the whole namespace) so HybridCacheEntryOptions stays allowed as a pure options type

New:
- IGlobalCacheService: public interface for intentionally cross-tenant cache entries
- GlobalHybridCache: internal sealed implementation (Singleton), thin delegate to HybridCache
- Registered as Singleton in AddHeroCaching()
- TenantThemeService: migrated _globalCache from HybridCache to IGlobalCacheService

Architecture test now has 5 clean rules with zero exceptions:
1. No HybridCache class injection in any module
2. ITenantCacheService is a public interface
3. IGlobalCacheService is a public interface
4. Both implementations (TenantHybridCache, GlobalHybridCache) are internal
5. Guard: at least one module assembly scanned
…CacheService

- Migrated IdempotencyEndpointFilter from direct HybridCache usage to ITenantCacheService
- Updated CacheKeys to expose logical IdempotencyEntry and full IdempotencyEntryFull (for distributed cache read probe)
- Added using directive to CachingArchitectureTests.cs to fix XML doc cref reference compilation issue
- Updated Caching.Tests (CacheKeysTests and HeroCachingRegistrationTests) to match new signatures and test registration
Copy link
Copy Markdown
Member

@iammukeshm iammukeshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the structural work here — ITenantCacheService + IGlobalCacheService is exactly the pattern I'd want, and it mirrors the "tenant-by-default + opt-out marker" model we just landed at the EF Core layer in e08f3ffa (IGlobalEntity / ApplyTenantIsolationByDefault). The architecture test that bans direct HybridCache injection from modules is a great guardrail.

Three blockers before this can merge, plus a couple of nits.

🚩 Blocker 1 — Tag-based invalidation is broken (format mismatch)

In TenantHybridCache.cs, the SET path and REMOVE-BY-TAG path use different prefix formats for the same tag:

// SET path (ScopeTags) — tags an entry with: [\"tenant:t1\", \"themes\"]
private static IEnumerable<string> ScopeTags(string tenantId, IEnumerable<string>? callerTags)
{
    yield return CacheKeys.Tags.Tenant(tenantId);       // → \"tenant:t1\"
    foreach (var tag in callerTags) yield return tag;    // → \"themes\" (unprefixed!)
}

// REMOVE path — looks up tag \"t:t1:themes\"
public ValueTask RemoveByTagAsync(string tag, CancellationToken ct = default)
{
    var tenantId = GetTenantId();
    return _cache.RemoveByTagAsync($\"t:{tenantId}:{tag}\", ct);
}

CacheKeys.Tags.Tenant(t) returns \"tenant:{t}\", but the remove-side lookup is \"t:{t}:{tag}\". These don't match, and caller tags are stored unprefixed but read as prefixed.

Concrete failure: in this PR's TenantThemeService.InvalidateCacheAsync:

await _cache.RemoveByTagAsync(CacheKeys.Tags.Themes, ct).ConfigureAwait(false);

…silently fails to evict anything because the stored tag is \"themes\" but the lookup is \"t:t1:themes\". Theme invalidation by tag is a no-op after this PR.

Fix: make the two paths agree on the format. Simplest: on SET, prefix every caller tag with t:{tenantId}: AND keep the standalone tenant:{tenantId} tag for whole-tenant purges. Then RemoveByTagAsync(\"themes\") correctly looks up t:{tenantId}:themes. Existing test coverage misses this entirely — please add a round-trip test (set with tag → remove by that tag → reread misses).

🚩 Blocker 2 — Merge conflict on CacheKeys.cs

GitHub shows mergeStateStatus: DIRTY / mergeable: CONFLICTING. Your branch predates two recently-merged commits:

  • e08f3ffa feat: multitenancy hardening, impersonation, admin app rebuild, demo seeding refactor — adds ImpersonationGrantStatus(jti) to CacheKeys, plus other touches to the same file
  • 861381e1 fix(migrator): resolve JwtOptions validation failure in Aspire orchestrator (#1242) — separate file but landed on develop after your branch point

Please rebase on origin/develop. After the rebase, also reconsider whether to keep the breaking change to CacheKeys.IdempotencyEntry(...). Splitting it into IdempotencyEntry(key) (logical) + IdempotencyEntryFull(tenantId, key) (full) is a breaking change to a public static API surface — fine if intentional, but worth flagging.

🚩 Blocker 3 — Duplicated migrator fix

src/Host/FSH.Starter.DbMigrator/Program.cs in your diff is the exact same Aspire fix that just landed as #1242. After the rebase that hunk should disappear entirely.

Other comments (not blockers, but worth addressing in this PR)

Tenant resolution inconsistency in IdempotencyEndpointFilter

The PR reads tenantId from the JWT claim directly:

var tenantId = httpContext.User.FindFirst(\"tenant\")?.Value ?? \"global\";
var fullKey = CacheKeys.IdempotencyEntryFull(tenantId, idempotencyKey);

…then writes via tenantCache.SetAsync(logicalKey, ...) which resolves tenant from IMultiTenantContextAccessor. These two sources of tenant id are not always the same — see the root-operator header override I added in MultitenancyModule.ConfigureMiddleware (develop's MultitenancyModule.cs), where a caller's claim is root but the resolved tenant is the override target. Probe-read would look up under root, write would land under the override target → idempotency replay would never match.

Likely a corner case in practice (root operators don't typically replay), but the two paths should agree. Read tenant from IMultiTenantContextAccessor in both, or invert and use the claim in both.

InvalidatePermissionCacheAsync wiring is unaddressed

The v1 readiness punchlist calls out: "Wire InvalidatePermissionCacheAsync to AssignRoles, UpdateRolePermissions, group add/remove". This PR moves the cache infrastructure but doesn't actually wire the invalidation calls into those handlers. Is that a follow-up PR, or did you assume it's out of scope?

Nice touches I want to call out

  • The carve-out for HybridCacheEntryOptions vs banning HybridCache — correctly checks by full class name not namespace. Good.
  • InternalsVisibleTo(\"Caching.Tests\") for TenantHybridCache.Create factory rather than making the class public. Right call.
  • The arch test guarding against vacuous pass on empty assembly list (CachingArchitectureTests_Should_HaveAtLeastOneModuleToScan). Nice paranoia.

Summary

Concept ✅, blocked on a real tag-format bug + rebase + de-duplication. Once the three above are fixed and the round-trip tag test is added, I'd be happy to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants