Skip to content

snappy: optimize UnalignedCopy64 and IncrementalCopy for RISC-V#3360

Open
Felix-Gong wants to merge 2 commits into
apache:masterfrom
Felix-Gong:riscv-snappy-opt
Open

snappy: optimize UnalignedCopy64 and IncrementalCopy for RISC-V#3360
Felix-Gong wants to merge 2 commits into
apache:masterfrom
Felix-Gong:riscv-snappy-opt

Conversation

@Felix-Gong

Copy link
Copy Markdown
Contributor

Summary

  • Optimize UnalignedCopy64() with direct ld/sd inline assembly for 8-byte copy
  • Optimize IncrementalCopy() with 8-byte bulk copies when source/dest don't overlap

Performance Improvement (direct function benchmark)

Scenario Baseline Optimized Improvement
Decompress compressible-256K 728 MB/s 2205 MB/s +203%
Decompress zeros-256K 543 MB/s 1462 MB/s +169%

Changes

  • src/butil/third_party/snappy/snappy-stubs-internal.h: +11 lines
  • src/butil/third_party/snappy/snappy.cc: +24 lines

Test Plan

  • brpc_snappy_compress_unittest passed (7/7)
  • Build verified on RISC-V (SOPHGO SG2044)

Use RISC-V inline assembly (ld/sd) for 8-byte copy operations instead
of generic macro-based implementation.

Changes:
- UnalignedCopy64: direct ld/sd pair for 8-byte copy
- IncrementalCopy: 8-byte bulk copies when source/dest don't overlap

Performance improvement (direct function benchmark):
- Decompress compressible-256K: 728 MB/s -> 2205 MB/s (+203%)
- Decompress zeros-256K: 543 MB/s -> 1462 MB/s (+169%)

Tests: brpc_snappy_compress_unittest passed (7/7)

Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
@wwbmmm

wwbmmm commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

LGTM

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve Snappy decompression performance on RV64 by accelerating 8-byte copy paths in UnalignedCopy64() and IncrementalCopy() using inline RISC-V ld/sd.

Changes:

  • Add an RV64-specific 8-byte bulk-copy loop to IncrementalCopy() when overlap is considered safe.
  • Add an RV64-specific ld/sd implementation for UnalignedCopy64().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/butil/third_party/snappy/snappy.cc Adds RV64 inline-asm 8-byte copying to IncrementalCopy() for non-overlapping (or “safe overlap”) cases.
src/butil/third_party/snappy/snappy-stubs-internal.h Adds RV64 inline-asm ld/sd fast path to UnalignedCopy64().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/butil/third_party/snappy/snappy.cc Outdated
Comment on lines +101 to +116
// RISC-V optimized: use 8-byte copies when possible
if (len >= 8 && (op - src >= 8 || src - op >= 8)) {
// Non-overlapping or safe overlap: copy 8 bytes at a time
do {
uint64_t tmp;
__asm__ volatile(
"ld %0, %1\n\t"
"sd %0, %2\n\t"
: "=&r"(tmp)
: "m"(*(const uint64_t*)src), "m"(*(uint64_t*)op)
: "memory");
src += 8;
op += 8;
len -= 8;
} while (len >= 8);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added runtime alignment check with ((uintptr_t)src | (uintptr_t)op) & 7) == 0. Only uses ld/sd when both pointers are 8-byte aligned, otherwise falls back to byte-by-byte copy.

Comment on lines +167 to +176
#if defined(__riscv) && __riscv_xlen == 64
// RISC-V optimized: single ld/sd pair for 8-byte copy
uint64_t tmp;
__asm__ volatile(
"ld %0, %1\n\t"
"sd %0, %2\n\t"
: "=&r"(tmp)
: "m"(*(const uint64_t*)src), "m"(*(uint64_t*)dst)
: "memory");
#else

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added alignment check (((uintptr_t)src | (uintptr_t)dst) & 7) == 0 before using ld/sd. Falls back to memcpy for unaligned addresses.

Address Copilot review: ld/sd require 8-byte alignment. Add runtime
alignment check and fall back to memcpy/byte-copy for unaligned
addresses to avoid traps on implementations that don't support
misaligned access.

Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants