Skip to content

⚡️ Optimize solady.cbrt #1527

Merged
atarpara merged 5 commits into
mainfrom
ocbrt
May 19, 2026
Merged

⚡️ Optimize solady.cbrt #1527
atarpara merged 5 commits into
mainfrom
ocbrt

Conversation

@atarpara
Copy link
Copy Markdown
Collaborator

@atarpara atarpara commented May 16, 2026

Description

smaller bytecode and less gas

@duncancmt can you please formally verify this impl.
this is will give up to >94 bits in 5 iteration

Python script : https://gist.github.com/atarpara/6b9c18596fba67ebf17b93e33c7901db

Checklist

Ensure you completed all of the steps below before submitting your pull request:

  • Ran forge fmt?
  • Ran forge test?

Pull requests with an incomplete checklist will be thrown out.

duncancmt added a commit to 0xProject/0x-settler that referenced this pull request May 16, 2026
duncancmt added a commit to duncancmt/tamago that referenced this pull request May 16, 2026
@duncancmt
Copy link
Copy Markdown
Contributor

@codex proved it, but you need to modify the constants in order for it to work. The present constants are unsound.

@atarpara
Copy link
Copy Markdown
Collaborator Author

@codex proved it, but you need to modify the constants in order for it to work. The present constants are unsound.

Thanks a lot.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 17, 2026

Gas Snapshot Comparison Report

Generated at commit : 3d4000b, Compared to commit : 5dc5fc8

Contract Name Test Name Main Gas PR Gas Diff
FixedPointMathLibWithCLZTest testCbrt() 6472 6222 -250
testCbrtWad() 8377 8231 -146
testCbrtWadConverged() 2483 2449 -34
testCbrtWadDebug() 7153 7119 -34
testCbrtWadMonotonicallyIncreasing() 3561 3486 -75
test__codesize() 63037 62814 -223

@atarpara
Copy link
Copy Markdown
Collaborator Author

@duncancmt can you again verify new impl? Thanks a lots.

Copy link
Copy Markdown
Contributor

@duncancmt duncancmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formal verification checks out.

Comment thread src/utils/clz/FixedPointMathLib.sol Outdated
// shifted estimate is 0.
let b := sub(255, clz(x))
z := or(shr(7, shl(div(b, 3), byte(add(mod(b, 3), 29), 0x90b5e5))), 1)
// final error. This gives >97 bits of precision after only 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is subtly wrong. The worst-case error analysis bound is 91 bits, not 97. This doesn't matter for implementation correctness, though

Suggested change
// final error. This gives >97 bits of precision after only 5
// final error. This gives >91 bits of precision after only 5

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duncancmt are u sure? As per my it is 97 bit accuracy

Best formula:  c_s = (89 + 26*s) / 128,   applied with s = mod(n+2, 3)

case n mod 3 = 0:
  n = 3k + 0
  z0           = 2^floor((3k+0+2)/3) = 2^(k+0) = 1 * 2^k
  x^(1/3)      in [1.0000, 1.2599) * 2^k
  ratio r      in (0.7937, 1.0000]
  error eps    in (-0.2063, +0.0000]
  worst |eps|  = 0.2063    (2.28 bits)

  with constant multiplier:
    s = (n+2) mod 3 = (3k+2) mod 3 = 2
    c[s]         = (89 + 26*2) / 128 = 141/128 = 1.101562
    new ratio r  in (0.8743, 1.1016]
    new eps      in (-0.1257, +0.1016]
    worst |eps|  = 0.1257   (2.99 bits)

case n mod 3 = 1:
  n = 3k + 1
  z0           = 2^floor((3k+1+2)/3) = 2^(k+1) = 2 * 2^k
  x^(1/3)      in [1.2599, 1.5874) * 2^k
  ratio r      in (1.2599, 1.5874]
  error eps    in (+0.2599, +0.5874]
  worst |eps|  = 0.5874    (0.77 bits)

  with constant multiplier:
    s = (n+2) mod 3 = (3k+3) mod 3 = 0
    c[s]         = (89 + 26*0) / 128 = 89/128 = 0.695312
    new ratio r  in (0.8760, 1.1037]
    new eps      in (-0.1240, +0.1037]
    worst |eps|  = 0.1240   (3.01 bits)

case n mod 3 = 2:
  n = 3k + 2
  z0           = 2^floor((3k+2+2)/3) = 2^(k+1) = 2 * 2^k
  x^(1/3)      in [1.5874, 2.0000) * 2^k
  ratio r      in (1.0000, 1.2599]
  error eps    in (+0.0000, +0.2599]
  worst |eps|  = 0.2599    (1.94 bits)

  with constant multiplier:
    s = (n+2) mod 3 = (3k+4) mod 3 = 1
    c[s]         = (89 + 26*1) / 128 = 115/128 = 0.898438
    new ratio r  in (0.8984, 1.1320]
    new eps      in (-0.1016, +0.1320]
    worst |eps|  = 0.1320   (2.92 bits)

Optimal integer multipliers (by s = mod(b, 3)):
  c[0] = 89/128 = 0.695312
  c[1] = 115/128 = 0.898438
  c[2] = 141/128 = 1.101562

Worst |eps_0| (with best c) = 0.1320  (2.92 bits)

Newton trace from worst eps_0 = +0.1320:
  step 0: eps =  +1.3196e-01, bits =    2.92
  step 1: eps =  +1.4786e-02, bits =    6.08
  step 2: eps =  +2.1439e-04, bits =   12.19
  step 3: eps =  +4.5948e-08, bits =   24.38
  step 4: eps =  +2.1112e-15, bits =   48.75
  step 5: eps =  +4.4573e-30, bits =   97.50  <-- target

Copy link
Copy Markdown
Contributor

@duncancmt duncancmt May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sign of the relative error matters because there is an un-squared term in the relative error recurrence relation

def f(e):
    return e**2 * (3 + 2 * e) / (3 * (1 + e) ** 2)
def f5(e):
    return f(f(f(f(f(e)))))
def log_2(x, prec=Decimal('0.0001')):
    return (x.ln() / Decimal(2).ln()).quantize(prec)
es = [Decimal('-0.12396'), Decimal('+0.10374'), Decimal('-0.10156'), Decimal('+0.13196'), Decimal('-0\
.12569'), Decimal('+0.10156')]
max([log_2(f5(abs(e))) for e in es])
# Decimal('-97.5018')
max([log_2(f5(e)) for e in es])
# Decimal('-91.8558')

@atarpara
Copy link
Copy Markdown
Collaborator Author

I got a new constant which replace MUL with SHL and use 2 gas less
but in testing somehow it is increase gas

z := shr(7, shl(div(b, 3), add(74, shl(mod(b, 3), 18))))

image

@atarpara atarpara requested a review from duncancmt May 18, 2026 07:58
@duncancmt
Copy link
Copy Markdown
Contributor

replace MUL with SHL and use 2 gas less but in testing somehow it is increase gas

the culprit is probably the stack scheduling because now mod(b, 3) appears as the left operand of shl instead of the right of mul. that probably adds an extra SWAP opcode somewhere. you'd have to disassemble to find out for sure. it make make a difference depending on whether you compile with the legacy pipeline or the IR pipeline.

Copy link
Copy Markdown
Contributor

@duncancmt duncancmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Formal verification checks out.

@atarpara
Copy link
Copy Markdown
Collaborator Author

Looks good. Formal verification checks out.

thanks a lot.

@atarpara atarpara merged commit eac28ea into main May 19, 2026
13 checks passed
@atarpara atarpara deleted the ocbrt branch May 19, 2026 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants