This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Thomas Lord scripsit:
> My plan (and stalled code) works that way. If a
> string contains only codepoints in 0..255, store it as bytes.
> 0..ffff, use 16-bits, otherwise, use 32.
This is a plausible design. If you are willing to pay more time to save
some more space, you could have multiple flavors of single-byte strings
based on SCSU dynamic windows. Keep a single overhead byte T with each
single-byte string that indicates the meaning of the byte range 80-FF:
Value of T Unicode offset Comment
01..67 x*80 half-blocks from U+0080 to U+3380
68..A7 x*80+AC00 half-blocks from U+E000 to U+FF80
F9 00C0 Latin-1 letters + half of Latin Extended-A
FA 0250 IPA Extensions
FB 0370 Greek
FC 0530 Armenian
FD 3040 Hiragana
FE 30A0 Katakana
FF FF60 Halfwidth Katakana
So your byte strings (range U+0000..U+00FF) would have an T byte of 01.
Of course there is no requirement to implement this entire scheme;
you can cherry-pick particular T values that make sense.
--
As you read this, I don't want you to feel John Cowan
sorry for me, because, I believe everyone jcowan@xxxxxxxxxxxxxxxxx
will die someday. http://www.reutershealth.com
--From a Nigerian-type scam spam http://www.ccil.org/~cowan