Unicode
Returns true
if the given char or integer is an assigned Unicode code point.
Examples
Normalize the string s
according to one of the four “normal forms” of the Unicode standard: normalform
can be :NFC
, :NFD
, :NFKC
, or :NFKD
. Normal forms C (canonical composition) and D (canonical decomposition) convert different visually identical representations of the same abstract string into a single canonical form, with form C being more compact. Normal forms KC and KD additionally canonicalize “compatibility equivalents”: they convert characters that are abstractly similar but visually distinct into a single canonical choice (e.g. they expand ligatures into the individual characters), with form KC being more compact.
Alternatively, finer control and additional transformations may be be obtained by calling , where any number of the following boolean keywords options (which all default to false
except for compose
) are specified:
compose=false
: do not perform canonical compositiondecompose=true
: do canonical decomposition instead of canonical composition (compose=true
is ignored if present)casefold=true
: perform Unicode case folding, e.g. for case-insensitive string comparisonnewline2lf=true
, , ornewline2ps=true
: convert various newline sequences (LF, CRLF, CR, NEL) into a linefeed (LF), line-separation (LS), or paragraph-separation (PS) character, respectivelystripignore=true
: strip Unicode’s “default ignorable” characters (e.g. the soft hyphen or the left-to-right marker)stripcc=true
: strip control characters; horizontal tabs and form feeds are converted to spaces; newlines are also converted to spaces unless a newline-conversion flag was specifiedstable=true
: enforce Unicode Versioning Stability
Examples
Unicode.graphemes
— Function