Lingua::KO::Hangul::Util - utility functions for Hangul in Unicode |
Lingua::KO::Hangul::Util - utility functions for Hangul in Unicode
use Lingua::KO::Hangul::Util qw(:all);
decomposeSyllable("\x{AC00}"); # "\x{1100}\x{1161}" composeSyllable("\x{1100}\x{1161}"); # "\x{AC00}" decomposeJamo("\x{1101}"); # "\x{1100}\x{1100}" composeJamo("\x{1100}\x{1100}"); # "\x{1101}"
getHangulName(0xAC00); # "HANGUL SYLLABLE GA" parseHangulName("HANGUL SYLLABLE GA"); # 0xAC00
A Hangul syllable consists of Hangul jamo (Hangul letters).
Hangul letters are classified into three classes:
CHOSEONG (the initial sound) as a leading consonant (L), JUNGSEONG (the medial sound) as a vowel (V), JONGSEONG (the final sound) as a trailing consonant (T).
Any Hangul syllable is a composition of (i) L + V, or (ii) L + V + T.
$resultant_string = decomposeSyllable($string)
LV
or LVT
)
to a sequence of conjoining jamo (L + V
or L + V + T
)
and returns the result as a string.
Any characters other than Hangul syllables are not affected.
$resultant_string = composeSyllable($string)
L + V
or L + V + T
)
to a precomposed syllable (LV
or LVT
) if possible,
and returns the result as a string.
A syllable LV
and final jamo T
are also composed.
Any characters other than Hangul jamo and syllables are not affected.
$resultant_string = decomposeJamo($string)
e.g. CHOSEONG SIOS-PIEUP to CHOSEONG SIOS + PIEUP JUNGSEONG AE to JUNGSEONG A + I JUNGSEONG WE to JUNGSEONG U + EO + I JONGSEONG SSANGSIOS to JONGSEONG SIOS + SIOS
$resultant_string = composeJamo($string)
L1 + L2
, V1 + V2 + V3
, etc.)
to a complex jamo if possible,
and returns the result as a string.
Any characters other than simple jamo are not affected.
e.g. CHOSEONG SIOS + PIEUP to CHOSEONG SIOS-PIEUP JUNGSEONG A + I to JUNGSEONG AE JUNGSEONG U + EO + I to JUNGSEONG WE JONGSEONG SIOS + SIOS to JONGSEONG SSANGSIOS
$resultant_string = decomposeFull($string)
decomposeJamo(decomposeSyllable($string))
.
$string_decomposed = decomposeHangul($code_point)
@codepoints = decomposeHangul($code_point)
decomposeHangul(0xAC00) # U+AC00 is HANGUL SYLLABLE GA. returns "\x{1100}\x{1161}" or (0x1100, 0x1161);
decomposeHangul(0xAE00) # U+AE00 is HANGUL SYLLABLE GEUL. returns "\x{1100}\x{1173}\x{11AF}" or (0x1100, 0x1173, 0x11AF);
Otherwise, returns false (empty string or empty list).
decomposeHangul(0x0041) # outside Hangul syllables returns empty string or empty list.
$string_composed = composeHangul($src_string)
@code_points_composed = composeHangul($src_string)
L
and a medial jamo V
is composed to a syllable LV
;
then any sequence of a syllable LV
and a final jamo T
is composed to a syllable LVT
.
Any characters other than Hangul jamo and syllables are not affected.
composeHangul("\x{1100}\x{1173}\x{11AF}.") # returns "\x{AE00}." or (0xAE00,0x2E);
$code_point_composite = getHangulComposite($code_point_here, $code_point_next)
$code_point_here
and $code_point_next
,
are in Hangul, and composable.
Otherwise, returns undef
.
The following functions handle only a precomposed Hangul syllable
(from U+AC00
to U+D7A3
), but not a Hangul jamo
or other Hangul-related character.
Names of Hangul syllables have a format of "HANGUL SYLLABLE %s"
.
$name = getHangulName($code_point)
getHangulName(0xAC00) returns "HANGUL SYLLABLE GA"; getHangulName(0x0041) returns undef.
$codepoint = parseHangulName($name)
parseHangulName("HANGUL SYLLABLE GEUL") returns 0xAE00;
parseHangulName("LATIN SMALL LETTER A") returns undef;
parseHangulName("HANGUL SYLLABLE PERL") returns undef; # Regrettably, HANGUL SYLLABLE PERL does not exist :-)
Standard Korean syllable block consists of L+ V+ T*
(a sequence of one or more L, one or more V, and zero or more T)
according to conjoining jamo behabior revised in Unicode 3.2 (cf. UAX #28).
A sequence of L
followed by T
is not a syllable block without V
,
but consists of two nonstandard syllable blocks: one without V
, and another
without L
and V
.
$bool = isStandardForm($string)
$resultant_string = insertFiller($string)
Lf
, U+115F
) is inserted into a syllable block
without L
. Jungseong filler (Vf
, U+1160
) is inserted into
a syllable block without V
.
$type = getSyllableType($code_point)
"L"
for leading jamo, "V"
for vowel jamo, "T"
for trailing jamo,
"LV"
for LV syllables, "LVT"
for LVT syllables, and "NA"
for other code points (as Not Applicable).
By default:
decomposeHangul composeHangul getHangulName parseHangulName getHangulComposite
On request:
decomposeSyllable composeSyllable decomposeJamo composeJamo decomposeFull isStandardForm insertFiller getSyllableType
This module does not support Hangul jamo assigned in Unicode 5.2.0 (2009).
A list of Hangul charcters this module supports:
1100..1159 ; 1.1 # [90] HANGUL CHOSEONG KIYEOK..HANGUL CHOSEONG YEORINHIEUH 115F..11A2 ; 1.1 # [68] HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG SSANGARAEA 11A8..11F9 ; 1.1 # [82] HANGUL JONGSEONG KIYEOK..HANGUL JONGSEONG YEORINHIEUH AC00..D7A3 ; 2.0 # [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH
SADAHIRO Tomoyuki <SADAHIRO@cpan.org>
Copyright(C)
2001, 2003, 2005, SADAHIRO Tomoyuki. Japan.
All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
(summary: http://std.dkuug.dk/JTC1/SC22/WG20/docs/N953.PDF) (cf. http://std.dkuug.dk/JTC1/SC22/WG20/docs/documents.html)
Lingua::KO::Hangul::Util - utility functions for Hangul in Unicode |