Files
node/tools
Joyee Cheung 72df124e38 build: encode non-ASCII Latin1 characters as one byte in JS2C
Previously we had two encodings for JS files:

1. If a file contains only ASCII characters, encode it as a one-byte
  string (interpreted as uint8_t array during loading).
2. If a file contains any characters with code point above 127,
  encode it as a two-byte string (interpreted as uint16_t array
  during loading).

This was done because V8 only supports Latin-1 and UTF16 encoding
as underlying representation for strings. To store the JS code
as external strings to save encoding cost and memory overhead
we need to follow the representations supported by V8.
Notice that there is a gap in the Latin1 range (128-255) that we
encoded as two-byte, which was an undocumented TODO for a long
time. That was fine previously because then files that contained
code points beyond the 0-127 range contained code points >255.
Now we have undici which contains code points in the range 0-255
(minus a replaceable code point >255). So this patch adds handling
for the 128-255 range to reduce the size overhead caused by encoding
them as two-byte. This could reduce the size of the binary by
~500KB and helps future files with this kind of code points.

Drive-by: replace `’` with `'` in undici.js to make it a Latin-1
only string. That could be removed if undici updates itself to
replace this character in the comment.

PR-URL: https://github.com/nodejs/node/pull/51605
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: Ethan Arrowood <ethan@arrowood.dev>
2024-02-17 17:09:24 +00:00
..
2024-02-15 19:46:29 +00:00
2023-10-31 12:44:38 +00:00
2024-02-15 19:46:30 +00:00
2023-07-12 14:40:57 +00:00
2024-01-24 21:02:10 +00:00
2024-01-06 20:58:03 -06:00
2023-11-11 09:51:05 +00:00