new(*args) public

possible options elements:

hash form:
  :invalid => nil            # raise error on invalid byte sequence (default)
  :invalid => :replace       # replace invalid byte sequence
  :undef => nil              # raise error on undefined conversion (default)
  :undef => :replace         # replace undefined conversion
  :replace => string         # replacement string ("?" or "\uFFFD" if not specified)
  :newline => :universal     # decorator for converting CRLF and CR to LF
  :newline => :crlf          # decorator for converting LF to CRLF
  :newline => :cr            # decorator for converting LF to CR
  :universal_newline => true # decorator for converting CRLF and CR to LF
  :crlf_newline => true      # decorator for converting LF to CRLF
  :cr_newline => true        # decorator for converting LF to CR
  :xml => :text              # escape as XML CharData.
  :xml => :attr              # escape as XML AttValue
integer form:
  Encoding::Converter::INVALID_REPLACE
  Encoding::Converter::UNDEF_REPLACE
  Encoding::Converter::UNDEF_HEX_CHARREF
  Encoding::Converter::UNIVERSAL_NEWLINE_DECORATOR
  Encoding::Converter::CRLF_NEWLINE_DECORATOR
  Encoding::Converter::CR_NEWLINE_DECORATOR
  Encoding::Converter::XML_TEXT_DECORATOR
  Encoding::Converter::XML_ATTR_CONTENT_DECORATOR
  Encoding::Converter::XML_ATTR_QUOTE_DECORATOR

Encoding::Converter.new creates an instance of Encoding::Converter.

Source_encoding and destination_encoding should be a string or Encoding object.

opt should be nil, a hash or an integer.

convpath should be an array. convpath may contain

  • two-element arrays which contain encodings or encoding names, or

  • strings representing decorator names.

Encoding::Converter.new optionally takes an option. The option should be a hash or an integer. The option hash can contain :invalid => nil, etc. The option integer should be logical-or of constants such as Encoding::Converter::INVALID_REPLACE, etc.

:invalid => nil

Raise error on invalid byte sequence. This is a default behavior.

:invalid => :replace

Replace invalid byte sequence by replacement string.

:undef => nil

Raise an error if a character in source_encoding is not defined in destination_encoding. This is a default behavior.

:undef => :replace

Replace undefined character in destination_encoding with replacement string.

:replace => string

Specify the replacement string. If not specified, “uFFFD” is used for Unicode encodings and “?” for others.

:universal_newline => true

Convert CRLF and CR to LF.

:crlf_newline => true

Convert LF to CRLF.

:cr_newline => true

Convert LF to CR.

:xml => :text

Escape as XML CharData. This form can be used as a HTML 4.0 #PCDATA.

  • ‘&’ -> ‘&’

  • ‘<’ -> ‘&lt;’

  • ‘>’ -> ‘&gt;’

  • undefined characters in destination_encoding -> hexadecimal CharRef such as &#xHH;

:xml => :attr

Escape as XML AttValue. The converted result is quoted as “…”. This form can be used as a HTML 4.0 attribute value.

  • ‘&’ -> ‘&amp;’

  • ‘<’ -> ‘&lt;’

  • ‘>’ -> ‘&gt;’

  • ‘“’ -> ‘&quot;’

  • undefined characters in destination_encoding -> hexadecimal CharRef such as &#xHH;

Examples:

# UTF-16BE to UTF-8
ec = Encoding::Converter.new("UTF-16BE", "UTF-8")

# Usually, decorators such as newline conversion are inserted last.
ec = Encoding::Converter.new("UTF-16BE", "UTF-8", :universal_newline => true)
p ec.convpath #=> [[#<Encoding:UTF-16BE>, #<Encoding:UTF-8>],
              #    "universal_newline"]

# But, if the last encoding is ASCII incompatible,
# decorators are inserted before the last conversion.
ec = Encoding::Converter.new("UTF-8", "UTF-16BE", :crlf_newline => true)
p ec.convpath #=> ["crlf_newline",
              #    [#<Encoding:UTF-8>, #<Encoding:UTF-16BE>]]

# Conversion path can be specified directly.
ec = Encoding::Converter.new(["universal_newline", ["EUC-JP", "UTF-8"], ["UTF-8", "UTF-16BE"]])
p ec.convpath #=> ["universal_newline",
              #    [#<Encoding:EUC-JP>, #<Encoding:UTF-8>],
              #    [#<Encoding:UTF-8>, #<Encoding:UTF-16BE>]]
Show source
Register or log in to add new notes.