Flowdock

Chars enables you to work transparently with UTF-8 encoding in the Ruby <a href="/rails/String">String</a> class without having extensive knowledge about the encoding. A Chars object accepts a string upon initialization and proxies String methods in an encoding safe manner. All the normal String methods are also implemented on the proxy.

String methods are proxied through the Chars object, and can be accessed through the mb_chars method. Methods which would normally return a String object now return a Chars object so methods can be chained.

  "The Perfect String  ".mb_chars.downcase.strip.normalize #=> "the perfect string"

Chars objects are perfectly interchangeable with String objects as long as no explicit class checks are made. If certain methods do explicitly check the class, call to_s before you pass chars objects to them.

  bad.explicit_checking_method "T".mb_chars.downcase.to_s

The default Chars implementation assumes that the encoding of the string is UTF-8, if you want to handle different encodings you can write your own multibyte string handler and configure it through ActiveSupport::Multibyte.proxy_class.

  class CharsForUTF32
    def size
      @wrapped_string.size / 4
    end

    def self.accepts?(string)
      string.length % 4 == 0
    end
  end

  ActiveSupport::Multibyte.proxy_class = CharsForUTF32

Aliases

  • wrapped_string
  • wrapped_string

Constants

HANGUL_SBASE = 0xAC00

HANGUL_LBASE = 0x1100

HANGUL_VBASE = 0x1161

HANGUL_TBASE = 0x11A7

HANGUL_LCOUNT = 19

HANGUL_VCOUNT = 21

HANGUL_TCOUNT = 28

HANGUL_NCOUNT = HANGUL_VCOUNT * HANGUL_TCOUNT

HANGUL_SCOUNT = 11172

HANGUL_SLAST = HANGUL_SBASE + HANGUL_SCOUNT

HANGUL_JAMO_FIRST = 0x1100

HANGUL_JAMO_LAST = 0x11FF

UNICODE_WHITESPACE = [ (0x0009..0x000D).to_a, # White_Space # Cc [5] .. 0x0020, # White_Space # Zs SPACE 0x0085, # White_Space # Cc 0x00A0, # White_Space # Zs NO-BREAK SPACE 0x1680, # White_Space # Zs OGHAM SPACE MARK 0x180E, # White_Space # Zs MONGOLIAN VOWEL SEPARATOR (0x2000..0x200A).to_a, # White_Space # Zs [11] EN QUAD..HAIR SPACE 0x2028, # White_Space # Zl LINE SEPARATOR 0x2029, # White_Space # Zp PARAGRAPH SEPARATOR 0x202F, # White_Space # Zs NARROW NO-BREAK SPACE 0x205F, # White_Space # Zs MEDIUM MATHEMATICAL SPACE 0x3000, # White_Space # Zs IDEOGRAPHIC SPACE ].flatten.freeze

UNICODE_LEADERS_AND_TRAILERS = UNICODE_WHITESPACE + [65279]

UNICODE_TRAILERS_PAT = /(#{codepoints_to_pattern(UNICODE_LEADERS_AND_TRAILERS)})+\Z/

UNICODE_LEADERS_PAT = /\A(#{codepoints_to_pattern(UNICODE_LEADERS_AND_TRAILERS)})+/

UTF8_PAT = /\A(?: [\x00-\x7f] | [\xc2-\xdf] [\x80-\xbf] | \xe0 [\xa0-\xbf] [\x80-\xbf] | [\xe1-\xef] [\x80-\xbf] [\x80-\xbf] | \xf0 [\x90-\xbf] [\x80-\xbf] [\x80-\xbf] | [\xf1-\xf3] [\x80-\xbf] [\x80-\xbf] [\x80-\xbf] | \xf4 [\x80-\x8f] [\x80-\xbf] [\x80-\xbf] )*\z/xn

Attributes

[R] wrapped_string
Show files where this class is defined (1 file)
Register or log in to add new notes.