Next: , Previous: Specials, Up: Top

9 Character and String Types

The :SB-UNICODE feature implies support for all 1114112 potential characters in the character space defined by the Unicode consortium, with the identity mapping between lisp char-code and Unicode code point. SBCL releases before version 0.8.17, and those without the :SB-UNICODE feature, support only 256 characters, with the identity mapping between char-code and Latin1 (or, equivalently, the first 256 Unicode) code point.

In the absence of the :SB-UNICODE feature, the types base-char and character are identical, and encompass the set of all 256 characters supported by the implementation. With the :SB-UNICODE on *features* (the default), however, base-char and character are distinct: character encompasses the set of all 1114112 characters, while base-char represents the set of the first 128 characters.

The effect of this on string types is that an sbcl configured with :SB-UNICODE has three disjoint string types: (vector nil), base-string and (vector character). In a build without :SB-UNICODE, there are two such disjoint types: (vector nil) and (vector character); base-string is identially equal to (vector character).

The SB-KERNEL:CHARACTER-SET-TYPE represents possibly noncontiguous sets of characters as lists of range pairs: for example, the type standard-char is represented as the type (sb-kernel:character-set '((10 . 10) (32 . 126)))