mbrtowc(), mbrtoc16(), mbrtoc32()

convert multibyte character to wide character using conversion states 

Function


SYNOPSIS

#include <wchar.h>

size_t mbrtowc(wchar_t *pc, const char *s, size_t n, mbstate_t *ps);

#include <uchar.h>

size_t mbrtoc16(char16_t *pc, const char *s, size_t n, mbstate_t *ps);

size_t mbrtoc32(char32_t *pc, const char *s, size_t n, mbstate_t *ps);

#include <wchar.h>

#include <locale.h>

size_t mbrtowc_l(wchar_t *pc, const char *s, size_t n, mbstate_t *ps, locale_t locale);

#include <uchar.h>

#include <locale.h>

size_t mbrtoc16_l(char16_t *pc, const char *s, size_t n, mbstate_t *ps, locale_t locale);

size_t mbrtoc32_l(char32_t *pc, const char *s, size_t n, mbstate_t *ps, locale_t locale);


DESCRIPTION

The mbrtowc(), mbrtoc16/() and mbrtoc32() functions inspect at most n bytes pointed to by s to determine the number of bytes needed to complete the next multibyte character. If a character can be completed, and pc is not NULL, the wide character which is represented by s is stored in the wchar_t, char16_t or char32_t it points to.

If s is a null pointer, the mbrtowc() function is equivalent to the call:

mbrtowc(NULL, "", 1, ps);

In this case, the values of the arguments pc and n are ignored.

If s is not null, the mbrtowc() function inspects at most n bytes beginning at the byte pointed to by s to determine the number of bytes needed to complete the next character (including any shift sequences). If the function determines that the next character is completed, it determines the value of the corresponding wide character, and then, if pc is not null, stored that value in the object pointed to by pc. If the corresponding wide character is the null wide character, the resulting state described is the initial conversion state.

If the specified state pointer is null, the mbrtowc() function uses its own internal mbstate_t object, which is initialized at program startup to the initial conversion state. Otherwise, the specified mbstate_t object is used to completely describe the current conversion state of the associated character sequence.

The behavior of mbrtowc() is affected by the LC_CTYPE category of the current locale.

mbrtowc_l(), mbrtoc16_l() and mbrtoc32_l() functions behave in the same way as mbrtowc() without the _l suffix, but uses the specified locale rather than the global or per-thread locale. A locale_t is returned by newlocale().


PARAMETERS

pc 

Points to a location to receive the converted wide character. This can be null if no returned wide character is desired.

s 

Is the string whose bytes are to be counted/converted.

n 

Specifies the maximum number of bytes to examine.

ps 

Is the conversion state. If this is null, an internal mbstate_t object is used.

locale 

Is a locale_t perhaps returned by newlocale() or LC_GLOBAL_LOCALE or 0 for the current thread locale set with uselocale().


RETURN VALUES

The mbrtowc() and mbrtowc_l() functions return the first of the following that applies:

0 

If the next n or fewer bytes complete the character that corresponds to the null wide character (which is the value stored).

positive number 

If the next n or fewer bytes complete a valid character (which is the value stored); the value returned is the number of bytes that complete the character.

-2 

If the next n bytes contribute to an incomplete by potentially valid character, and all n bytes have been processed (no value is stored). When n has at least the value of MB_CUR_MAX or MB_CUR_MAX_L, this case can only occur if s points at a sequence of redundant shift sequences (for locales with state-dependent encodings).

-1 

If an encoding error occurs, in which case the next n or fewer bytes do not contribute to a complete and valid character (no value is stored). In this case, errno is set to EILSEQ, and the conversion state is undefined.

-3 

The mbrtoc16() and mbrtoc16_l() functions can also return -3. This indicates that the char16_t is incomplete and must be read a second time to recover the state data stored in mbstate_t. No bytes from input have been consumed.


CONFORMANCE

mbrtowc() conforms to ANSI/ISO 9899:1999 'ISO C99'

mbrtowc_l() conforms to IEEE Std 1003.1-2008 'POSIX.1'

mbrtoc16(), mbrtoc16_l(), mbrtoc32(), and mbrtoc32_l() conform to ANSI/ISO 9899:2011 'ISO C11'


MULTITHREAD SAFETY LEVEL

MT-Safe, with exceptions.

The mbrtoc16(), mbrtoc32(), and mbrtowc() functions are MT-Safe as long as no thread calls setlocale() while these function are executing and NULL is not passed the ps pointer.

The function mbrtowc_l(), mbrtoc16_l(), mbrtoc32_l() functions are MT-Safe as long as no thread calls freelocale() on locale while these function are executing and NULL is not passed the ps pointer.


PORTING ISSUES

The current mbstate_t, for historical reasons, is implemented as an int (4 bytes). This is used internally as 4 multibyte characters. Code that passes one byte at a time to this function will have state problems at bytes 5 and 6 of a UTF-8 sequence. 5 and 6 byte UTF-8 sequnces are uncommon but exist. 4, 5, 6 byte UTF-8 passed as a single string or in two halves will work just fine with the newer char16_t and char32_t functions which have a viable way of representing a UTF-32 and a UTF-16 surrogate pairs (using the -3 return).

Runtime binary compatibility would need to be broken to extend the size of mbstate_t.


AVAILABILITY

PTC MKS Toolkit for Professional Developers
PTC MKS Toolkit for Professional Developers 64-Bit Edition
PTC MKS Toolkit for Enterprise Developers
PTC MKS Toolkit for Enterprise Developers 64-Bit Edition


SEE ALSO

Functions:
mbrlen(), mbrlen_l(), mbsinit(), mbsinit_l(), mbsrtowcs(), mbsrtowcs_l(), newlocale(), setlocale(), wcrtomb(), wcrtomb_l(), wcsrtombs(), wcsrtombs_l()


PTC MKS Toolkit 10.5 Documentation Build 40.