NAME
    MIME::Charset - Charset Informations for MIME

SYNOPSIS
        use MIME::Charset:

        $charset = MIME::Charset->new("euc-jp");

    Getting charset informations:

        $benc = $charset->body_encoding; # e.g. "Q"
        $cset = $charset->canonical_charset; # e.g. "US-ASCII"
        $henc = $charset->header_encoding; # e.g. "S"
        $cset = $charset->output_charset; # e.g. "ISO-2022-JP"

    Translating text data:

        ($text, $charset, $encoding) =
            $charset->header_encode(
               "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
               "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef");
        # ...returns e.g. (<converted>, "ISO-2022-JP", "B");

        ($text, $charset, $encoding) =
            $charset->body_encode(
                "Collectioneur path\xe9tiquement ".
                "\xe9clectique de d\xe9chets");
        # ...returns e.g. (<original>, "ISO-8859-1", "QUOTED-PRINTABLE");

        $len = $charset->encoded_header_len(
            "Perl\xe8\xa8\x80\xe8\xaa\x9e", "b"); # e.g. 28

    Manipulating module defaults:

        use MIME::Charset;

        MIME::Charset::alias("csEUCKR", "euc-kr");
        MIME::Charset::default("iso-8859-1");
        MIME::Charset::fallback("us-ascii");

    Non-OO functions (may be deprecated in near future):

        use MIME::Charset qw(:info);

        $benc = body_encoding("iso-8859-2"); # "Q"
        $cset = canonical_charset("ANSI X3.4-1968"); # "US-ASCII"
        $henc = header_encoding("utf-8"); # "S"
        $cset = output_charset("shift_jis"); # "ISO-2022-JP"

        use MIME::Charset qw(:trans);

        ($text, $charset, $encoding) =
            header_encode(
               "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
               "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef",
               "euc-jp");
        # ...returns (<converted>, "ISO-2022-JP", "B");

        ($text, $charset, $encoding) =
            body_encode(
                "Collectioneur path\xe9tiquement ".
                "\xe9clectique de d\xe9chets",
                "latin1");
        # ...returns (<original>, "ISO-8859-1", "QUOTED-PRINTABLE");

        $len = encoded_header_len(
            "Perl\xe8\xa8\x80\xe8\xaa\x9e", "b", "utf-8"); # 28

DESCRIPTION
    MIME::Charset provides informations about character sets used for MIME
    messages on Internet.

  Definitions
    The charset is ``character set'' used in MIME to refer to a method of
    converting a sequence of octets into a sequence of characters. It
    includes both concepts of ``coded character set'' (CCS) and ``character
    encoding scheme'' (CES) of ISO/IEC.

    The encoding is that used in MIME to refer to a method of representing a
    body part or a header body as sequence(s) of printable US-ASCII
    characters.

  Constructor
    $charset = MIME::Charset->new([CHARSET [, OPTS]])
        Create charset object.

        OPTS may accept following key-value pairs. NOTE: When
        Unicode/multibyte support is disabled (see "USE_ENCODE"), conversion
        will not be performed. So these options do not have any effects.

        Mapping => MAPTYPE
            Specify extended mappings actually used for charset names.
            "EXTENDED" uses extended mappings. "STANDARD" uses standardized
            strict mappings. Default is "EXTENDED".

  Getting Informations of Charsets
    $charset->body_encoding
    body_encoding CHARSET
        Get recommended transfer-encoding of CHARSET for message body.

        Returned value will be one of "B" (BASE64), "Q" (QUOTED-PRINTABLE)
        or "undef" (might not be transfer-encoded; either 7BIT or 8BIT).
        This may not be same as encoding for message header.

    $charset->as_string
    canonical_charset CHARSET
        Get canonical name for charset.

    $charset->decoder
        Get "Encode::Encoding" object to decode strings to Unicode by
        charset.

    $charset->dup
        Get a copy of charset object.

    $charset->encoder([CHARSET])
        Get "Encode::Encoding" object to encode Unicode string using
        compatible charset recommended to be used for messages on Internet.

        If optional CHARSET is specified, replace encoder (and output
        charset name) of $charset object with those of CHARSET, therefore,
        $charset object will be a converter between original charset and new
        CHARSET.

    $charset->header_encoding
    header_encoding CHARSET
        Get recommended encoding scheme of CHARSET for message header.

        Returned value will be one of "B", "Q", "S" (shorter one of either)
        or "undef" (might not be encoded). This may not be same as encoding
        for message body.

    $charset->output_charset
    output_charset CHARSET
        Get a charset which is compatible with given CHARSET and is
        recommended to be used for MIME messages on Internet (if it is known
        by this module).

        When Unicode/multibyte support is disabled (see "USE_ENCODE"), this
        function will simply return the result of "canonical_charset".

  Translating Text Data
    $charset->body_encode(STRING [, OPTS])
    body_encode STRING, CHARSET [, OPTS]
        Get converted (if needed) data of STRING and recommended
        transfer-encoding of that data for message body. CHARSET is the
        charset by which STRING is encoded.

        OPTS may accept following key-value pairs. NOTE: When
        Unicode/multibyte support is disabled (see "USE_ENCODE"), conversion
        will not be performed. So these options do not have any effects.

        Detect7bit => YESNO
            Try auto-detecting 7-bit charset when CHARSET is not given.
            Default is "YES".

        Replacement => REPLACEMENT
            Specifies error handling scheme. See "Error Handling".

        3-item list of (*converted string*, *charset for output*,
        *transfer-encoding*) will be returned. *Transfer-encoding* will be
        either "BASE64", "QUOTED-PRINTABLE", "7BIT" or "8BIT". If *charset
        for output* could not be determined and *converted string* contains
        non-ASCII byte(s), *charset for output* will be "undef" and
        *transfer-encoding* will be "BASE64". *Charset for output* will be
        "US-ASCII" if and only if string does not contain any non-ASCII
        bytes.

    $charset->decode(STRING [,CHECK])
        Decode STRING to Unicode.

        Note: When Unicode/multibyte support is disabled (see "USE_ENCODE"),
        this function will die.

    $charset->encode(STRING [,CHECK])
        Encode STRING (Unicode or non-Unicode) using compatible charset
        recommended to be used for messages on Internet (if this module
        knows it). Note that string will be decoded to Unicode then encoded
        even if compatible charset was equal to original charset.

        Note: When Unicode/multibyte support is disabled (see "USE_ENCODE"),
        this function will die.

    $charset->encoded_header_len(STRING [, ENCODING])
    encoded_header_len STRING, ENCODING, CHARSET
        Get length of encoded STRING for message header (without folding).

        ENCODING may be one of "B", "Q" or "S" (shorter one of either "B" or
        "Q").

    $charset->header_encode(STRING [, OPTS])
    header_encode STRING, CHARSET [, OPTS]
        Get converted (if needed) data of STRING and recommended encoding
        scheme of that data for message headers. CHARSET is the charset by
        which STRING is encoded.

        OPTS may accept following key-value pairs. NOTE: When
        Unicode/multibyte support is disabled (see "USE_ENCODE"), conversion
        will not be performed. So these options do not have any effects.

        Detect7bit => YESNO
            Try auto-detecting 7-bit charset when CHARSET is not given.
            Default is "YES".

        Replacement => REPLACEMENT
            Specifies error handling scheme. See "Error Handling".

        3-item list of (*converted string*, *charset for output*, *encoding
        scheme*) will be returned. *Encoding scheme* will be either "B", "Q"
        or "undef" (might not be encoded). If *charset for output* could not
        be determined and *converted string* contains non-ASCII byte(s),
        *charset for output* will be "8BIT" (this is *not* charset name but
        a special value to represent unencodable data) and *encoding scheme*
        will be "undef" (should not be encoded). *Charset for output* will
        be "US-ASCII" if and only if string does not contain any non-ASCII
        bytes.

    $charset->undecode(STRING [,CHECK])
    Encode Unicode string STRING to byte string by input charset of
    $charset. This is equivalent to "$charset->decoder->encode()".

    Note: When Unicode/multibyte support is disabled (see "USE_ENCODE"),
    this function will die.

  Manipulating Module Defaults
    alias ALIAS [, CHARSET]
        Get/set charset alias for canonical names determined by
        "canonical_charset".

        If CHARSET is given and isn't false, ALIAS will be assigned as an
        alias of CHARSET. Otherwise, alias won't be changed. In both cases,
        current charset name that ALIAS is assigned will be returned.

    default [CHARSET]
        Get/set default charset.

        Default charset is used by this module when charset context is
        unknown. Modules using this module are recommended to use this
        charset when charset context is unknown or implicit default is
        expected. By default, it is "US-ASCII".

        If CHARSET is given and isn't false, it will be set to default
        charset. Otherwise, default charset won't be changed. In both cases,
        current default charset will be returned.

        NOTE: Default charset *should not* be changed.

    fallback [CHARSET]
        Get/set fallback charset.

        Fallback charset is used by this module when conversion by given
        charset is failed and "FALLBACK" error handling scheme is specified.
        Modules using this module may use this charset as last resort of
        charset for conversion. By default, it is "UTF-8".

        If CHARSET is given and isn't false, it will be set to fallback
        charset. If CHARSET is "NONE", fallback charset will be undefined.
        Otherwise, fallback charset won't be changed. In any cases, current
        fallback charset will be returned.

        NOTE: It *is* useful that "US-ASCII" is specified as fallback
        charset, since result of conversion will be readable without charset
        informations.

    recommended CHARSET [, HEADERENC, BODYENC [, ENCCHARSET]]
        Get/set charset profiles.

        If optional arguments are given and any of them are not false,
        profiles for CHARSET will be set by those arguments. Otherwise,
        profiles won't be changed. In both cases, current profiles for
        CHARSET will be returned as 3-item list of (HEADERENC, BODYENC,
        ENCCHARSET).

        HEADERENC is recommended encoding scheme for message header. It may
        be one of "B", "Q", "S" (shorter one of either) or "undef" (might
        not be encoded).

        BODYENC is recommended transfer-encoding for message body. It may be
        one of "B", "Q" or "undef" (might not be transfer-encoded).

        ENCCHARSET is a charset which is compatible with given CHARSET and
        is recommended to be used for MIME messages on Internet. If
        conversion is not needed (or this module doesn't know appropriate
        charset), ENCCHARSET is "undef".

        NOTE: This function in the future releases can accept more optional
        arguments (for example, properties to handle character widths, line
        folding behavior, ...). So format of returned value may probably be
        changed. Use "header_encoding", "body_encoding" or "output_charset"
        to get particular profile.

  Constants
    USE_ENCODE
        Unicode/multibyte support flag. Non-empty string will be set when
        Unicode and multibyte support is enabled. Currently, this flag will
        be non-empty on Perl 5.8.1 or later and empty string on earlier
        versions of Perl.

  Error Handling
        "body_encode" and "header_encode" accept following "Replacement"
        options:

    "DEFAULT"
        Put a substitution character in place of a malformed character. For
        UCM-based encodings, <subchar> will be used.

    "FALLBACK"
        Try "DEFAULT" scheme using *fallback charset* (see "fallback"). When
        fallback charset is undefined and conversion causes error, code will
        die on error with an error message.

    "CROAK"
        Code will die on error immediately with an error message. Therefore,
        you should trap the fatal error with eval{} unless you really want
        to let it die on error. Synonym is "STRICT".

    "PERLQQ"
    "HTMLCREF"
    "XMLCREF"
        Use "FB_PERLQQ", "FB_HTMLCREF" or "FB_XMLCREF" scheme defined by
        Encode module.

    numeric values
        Numeric values are also allowed. For more details see "Handling
        Malformed Data" in Encode.

    If error handling scheme is not specified or unknown scheme is
    specified, "DEFAULT" will be assumed.

  Configuration File
    Built-in defaults for option parameters can be overridden by
    configuration file: MIME/Charset/Defaults.pm. For more details read
    MIME/Charset/Defaults.pm.sample.

VERSION
    Consult $VERSION variable.

    Development versions of this module may be found at
    <http://hatuka.nezumi.nu/repos/MIME-Charset/>.

SEE ALSO
    Multipurpose Internet Mail Extensions (MIME).

AUTHORS
    Copyright (C) 2006-2008 Hatuka*nezumi - IKEDA Soji
    <hatuka(at)nezumi.nu>.

    All rights reserved. This program is free software; you can redistribute
    it and/or modify it under the same terms as Perl itself.