State of char type
ReScript has the char primitive type, which is rarely used. (I was one of those who used char to handle ASCII keycodes)
https://rescript-lang.org/docs/manual/latest/primitive-types#char
Note: Char doesn't support Unicode or UTF-8 and is therefore not recommended.
The char doesn't support Unicode, but only supports UTF-16 codepoint.
compiles to
Its value is the same as '👋'.codePointAt(0) result in JavaScript, which means that in the value representation, char is equivalent to int (16-bit subset).
Then, why don't we use just int instead of char?
char literals are automatically compiled to codepoints. This is much more efficient than string representation when dealing with the Unicode data table.
char supports range pattern (e.g. 'a' .. 'z') in pattern matching. This is very useful when writing parsers.
However, a char literal is not really useful to represent a Unicode character because it doesn't cover the entire Unicode sequence. It only returns the first codepoint value and discards the rest of the character segment.
To avoid problems, we should limit the value range of char literal to the BMP(Basic Multilingual Plane, U+0000~U+FFFF).
Suggestion
I suggest some changes that would keep the useful parts of char but remove its confusion.
- Get rid of
char type or make it an alias of int
- Keep char literal syntax, but with internal representation as regular integers
- Limit the char literal range to BMP in the syntax level.
- Support range patterns for regular integers
- Remove the
Char module.
State of
chartypeReScript has the
charprimitive type, which is rarely used. (I was one of those who usedcharto handle ASCII keycodes)https://rescript-lang.org/docs/manual/latest/primitive-types#char
The
chardoesn't support Unicode, but only supports UTF-16 codepoint.compiles to
Its value is the same as
'👋'.codePointAt(0)result in JavaScript, which means that in the value representation,charis equivalent toint(16-bit subset).Then, why don't we use just
intinstead ofchar?charliterals are automatically compiled to codepoints. This is much more efficient than string representation when dealing with the Unicode data table.charsupports range pattern (e.g.'a' .. 'z') in pattern matching. This is very useful when writing parsers.However, a char literal is not really useful to represent a Unicode character because it doesn't cover the entire Unicode sequence. It only returns the first codepoint value and discards the rest of the character segment.
To avoid problems, we should limit the value range of
charliteral to the BMP(Basic Multilingual Plane, U+0000~U+FFFF).Suggestion
I suggest some changes that would keep the useful parts of char but remove its confusion.
chartype or make it an alias ofintCharmodule.