sqlite: optimize column name creation and text value encoding#61954
sqlite: optimize column name creation and text value encoding#61954thisalihassan wants to merge 2 commits intonodejs:mainfrom
Conversation
|
Review requested:
|
6adb625 to
ff6a788
Compare
Use simdutf to detect ASCII text values and create them via NewFromOneByte for compact one-byte representation. Internalize column name strings with kInternalized so V8 shares hidden classes across row objects. Cache column names on StatementSync for iterate(), invalidated via SQLITE_STMTSTATUS_REPREPARE on schema changes. Refs: nodejs/performance#181
ff6a788 to
4fcb1ed
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #61954 +/- ##
==========================================
+ Coverage 88.84% 89.77% +0.93%
==========================================
Files 674 674
Lines 204957 205640 +683
Branches 39309 39422 +113
==========================================
+ Hits 182087 184608 +2521
+ Misses 15088 13264 -1824
+ Partials 7782 7768 -14
🚀 New features to boost your workflow:
|
src/node_sqlite.cc
Outdated
| const char* data, | ||
| size_t length) { |
There was a problem hiding this comment.
Can you just take a std::string_view input as an argument rather than this C API like function?
src/node_sqlite.cc
Outdated
| NewStringType::kNormal, | ||
| len); | ||
| } | ||
| return String::NewFromUtf8(isolate, data, NewStringType::kNormal, len); |
There was a problem hiding this comment.
Why don't you use kInternalized here and in line 66?
There was a problem hiding this comment.
Utf8StringMaybeOneByte is used for text cell values (row data), not column names, text values are typically unique user data ("Ali Hassan", "ali.n@example.com", etc.) so the lookup would almost always miss
Column names are already interned separately in ColumnNameToName
| if (!iter->stmt_->ColumnNameToName(i).ToLocal(&key)) return; | ||
| row_keys.emplace_back(key); | ||
| } | ||
| if (!iter->stmt_->GetCachedColumnNames(&row_keys)) return; |
| .As<Name>(); | ||
| } | ||
|
|
||
| bool StatementSync::GetCachedColumnNames(LocalVector<Name>* keys) { |
There was a problem hiding this comment.
Can you add a comment to this function?
ad603d4 to
fd4215d
Compare
|
Hi @addaleax @geeksilva97 can I please get review on this PR when you get time? Thanks |
| Isolate* isolate = env()->isolate(); | ||
|
|
||
| int reprepare_count = | ||
| sqlite3_stmt_status(statement_, SQLITE_STMTSTATUS_REPREPARE, 0); |
There was a problem hiding this comment.
Nit
| sqlite3_stmt_status(statement_, SQLITE_STMTSTATUS_REPREPARE, 0); | |
| sqlite3_stmt_status(statement_, SQLITE_STMTSTATUS_REPREPARE, false); |
| bool StatementSync::GetCachedColumnNames(LocalVector<Name>* keys) { | ||
| Isolate* isolate = env()->isolate(); | ||
|
|
||
| int reprepare_count = |
There was a problem hiding this comment.
| int reprepare_count = | |
| const int reprepare_count = |
| sqlite3_stmt_status(statement_, SQLITE_STMTSTATUS_REPREPARE, 0); | ||
| if (reprepare_count != cached_column_names_reprepare_count_) { | ||
| cached_column_names_.clear(); | ||
| int num_cols = sqlite3_column_count(statement_); |
There was a problem hiding this comment.
| int num_cols = sqlite3_column_count(statement_); | |
| const int num_cols = sqlite3_column_count(statement_); |
| const char* v = \ | ||
| reinterpret_cast<const char*>(sqlite3_##from##_text(__VA_ARGS__)); \ | ||
| (result) = String::NewFromUtf8((isolate), v).As<Value>(); \ | ||
| int v_len = sqlite3_##from##_bytes(__VA_ARGS__); \ |
There was a problem hiding this comment.
| int v_len = sqlite3_##from##_bytes(__VA_ARGS__); \ | |
| const int v_len = sqlite3_##from##_bytes(__VA_ARGS__); \ |
|
|
||
| inline MaybeLocal<String> Utf8StringMaybeOneByte(Isolate* isolate, | ||
| std::string_view input) { | ||
| int len = static_cast<int>(input.size()); |
There was a problem hiding this comment.
| int len = static_cast<int>(input.size()); | |
| const int len = static_cast<int>(input.size()); |
| void StatementSync::Finalize() { | ||
| sqlite3_finalize(statement_); | ||
| statement_ = nullptr; | ||
| cached_column_names_.clear(); |
There was a problem hiding this comment.
Consider putting this in a method.
| .As<Name>(); | ||
| } | ||
|
|
||
| // Returns cached internalized column name strings for this statement, |
There was a problem hiding this comment.
This comment is incorrect, it returns a boolean.
| bool allow_unknown_named_params_; | ||
| std::optional<std::map<std::string, std::string>> bare_named_params_; | ||
| std::vector<v8::Global<v8::Name>> cached_column_names_; | ||
| int cached_column_names_reprepare_count_ = -1; |
There was a problem hiding this comment.
Consider using std::optional.
There was a problem hiding this comment.
Thanks for the suggestion @louwers but I benchmarked on the iterate path (30 runs) saw a consistent -1% to -2% regression after making these changes:
sqlite/sqlite-prepare-select-all.js statement='SELECT text_column FROM foo LIMIT 1' method='iterate' tableSeedSize=100000 n=100000 *** -2.06 %
sqlite/sqlite-prepare-select-all.js statement='SELECT text_column FROM foo LIMIT 100' method='iterate' tableSeedSize=100000 n=100000 *** -1.42 %
The reason might be std::optional comparison generates more instructions per row than just int
There was a problem hiding this comment.
Makes sense. Not worth it then.
There was a problem hiding this comment.
yeah it's confirmed, but very interesting

Skip the full
UTF-8decode path for text values that are pure ASCII by validating withsimdutfand creatingOneByteV8 strings, halving memory.Internalizecolumn name strings so V8 shares hidden classes across row objects. Cache column names in theiterate()hot loop, invalidating on schema changes viaSQLITE_STMTSTATUS_REPREPARE.Benchmark: 30-run

Refs: nodejs/performance#181