Skip to content

text.c: Improve the performance of text rendering#101

Open
TheGag96 wants to merge 2 commits intodevkitPro:masterfrom
TheGag96:faster-text
Open

text.c: Improve the performance of text rendering#101
TheGag96 wants to merge 2 commits intodevkitPro:masterfrom
TheGag96:faster-text

Conversation

@TheGag96
Copy link

@TheGag96 TheGag96 commented Jan 10, 2026

...by doing the following:

  • Create caches around fontGetCharWidthInfo and fontCalcGlyphPos for ASCII characters, because those are pretty slow. (Non-English languages do not get this benefit - perhaps an LRU cache for non-ASCII glyphs could be made to help here.)
  • Instead of doing one draw call per glyph, try to batch them as much as possible.
  • Because the system font is so fragmented (5 glyphs per texture), this requires collecting glyphs and sorting them before drawing batches to avoid costly texture swaps. (citro2d has the function C2D_TextOptimize for this exact reason.)

This lets me maintain 60 FPS on my o3DS with the 3D turned on in most but not all circumstances. Enough glyphs on screen can still cause dropped frames.

UPDATE: One more commit from another breakthrough:

As it turns out, the system font texture sheets are all 128x32 pixels and adjacent in memory! We can reinterpet the memory starting at sheet 0 and describe a much bigger texture that encompasses all of the ASCII glyphs and make our cache use that instead of the individual sheets. This will massively improve performance by reducing texture swaps within a piece of text, down to 0 if it's all English. We don't need any extra linear allocating to do this!

The coalescing will be applied to all characters / glyph sheets up until the last glyphInfo.nSheets % 32 sheets. This means that there are more operations per glyph being done in textGetGlyphPosFromCodePoint, but this is probably offset by the savings from not switching textures as often. And, this won't matter for English text, which has these results cached.

...by doing the following:

- Create caches around `fontGetCharWidthInfo` and `fontCalcGlyphPos`
  for ASCII characters, because those are pretty slow. (Non-English
  languages do not get this benefit - perhaps an LRU cache for non-ASCII
  glyphs could be made to help here.)
- Instead of doing one draw call per glyph, try to batch them as much as
  possible.
- Because the system font is so fragmented (5 glyphs per
  texture), this requires collecting glyphs and sorting them before
  drawing batches to avoid costly texture swaps. (citro2d has the
  function `C2D_TextOptimize` for this exact reason.)

This lets me maintain 60 FPS on my o3DS with the 3D turned on in most
but not all circumstances. Enough glyphs on screen can still cause
dropped frames.
…ets as combined

As it turns out, the sytem font texture sheets are all 128x32 pixels and
adjacent in memory! We can reinterpet the memory starting at sheet 0 and
describe a much bigger texture that encompasses all of the ASCII glyphs
and make our cache use that instead of the individual sheets. This will
massively improve performance by reducing texture swaps within a piece
of text, down to 0 if it's all English. We don't need any extra linear
allocating to do this!

The coalescing will be applied to all characters / glyph sheets up until
the last `glyphInfo.nSheets % 32` sheets. This means that there are more
operations per glyph being done in `textGetGlyphPosFromCodePoint`, but
this is probably offset by the savings from not switching textures as
often. And, this won't matter for English text, which has these results
cached.
Comment on lines +59 to +63
// As it turns out, the sytem font texture sheets are all 128x32 pixels and adjacent in memory! We can reinterpet
// the memory starting at sheet 0 and describe a much bigger texture that encompasses all of the ASCII glyphs and
// make our cache use that instead of the individual sheets. This will massively improve performance by reducing
// texture swaps within a piece of text, down to 0 if it's all English. We don't need any extra linear allocating to
// do this!
Copy link
Member

@fincs fincs Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more system fonts (KR,CN,TW) besides the normal one (JP/EU/NA). Instead of blindly making an assumption, would it be possible to explicitly detect if the texture sheets are in fact adjacent in memory, and fall back to the normal way if they aren't?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I just wrote some code to loop over each sheet to do this, but I'm now realizing that fontGetGlyphSheetTex is just indexing out of a flat array assuming a single sheet size shared between them to begin with:

static inline void* fontGetGlyphSheetTex(CFNT_s* font, int sheetIndex)
{
	if (!font)
		font = fontGetSystemFont();
	TGLP_s* tglp = fontGetGlyphInfo(font);
	return &tglp->sheetData[sheetIndex*tglp->sheetSize];
}

It seems then checking beforehand would be needless. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. In that case, the format does guarantee that all texsheets are adjacent in memory. (Can you tell I haven't looked at this code in many, many years? :p)

@fincs
Copy link
Member

fincs commented Feb 28, 2026

Great work. Combined with some planned upcoming changes, I think this will finally fix hbmenu rendering performance. However I am not entirely convinced by the second commit, I think it should be revised to explicitly check if the optimization can be performed, and fall back on the normal method otherwise.

@TheGag96
Copy link
Author

TheGag96 commented Mar 1, 2026

Understandable, I can do that. So far, I have not observed this hack to fail on the couple systems I've tested, as well as Azahar with the system font available from a NAND dump.

@fincs
Copy link
Member

fincs commented Mar 1, 2026

Can you rebase your branch against latest master? There is a fix for a libctru API change in there without which CI fails on this PR.

#include "text.h"

#define NUM_ASCII_CHARS 128
#define SHEETS_PER_BIG_SHEET 32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding this, I think it would be a better idea to explicitly calculate it as 1024/texSheetHeight

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants