logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

unicode_u_ucs4_native, unicode_u_ucs2_native, unicode_convert_init, unicode_convert,

Author

SamVarshavchik
           Author

Courier Unicode Library                            05/18/2024                                 UNICODE_CONVERT(3)

Description

unicode_u_ucs4_native[] contains the string “UCS-4BE” or “UCS-4LE”, matching the native char32_t
       endianness.

       unicode_u_ucs2_native[] contains the string “UCS-2BE” or “UCS-2LE”, matching the native char32_t
       endianness.

       unicode_convert_init(), unicode_convert(), and unicode_convert_deinit() are an adaption of th iconv(3)
       API that uses the same calling convention as the other algorithms in this unicode library, with some
       value-added features. These functions use iconv(3) to effect the actual character set conversion.

       unicode_convert_init() returns a non-NULL handle for the requested conversion, or NULL if the requested
       conversion is not available.  unicode_convert_init() takes a pointer to the output function that receives
       receives converted character text. The output function receives a pointer to the converted character
       text, and the number of characters in the converted text. The output function gets repeatedly called,
       until it receives the entire converted text.

       The character text to convert gets passed, repeatedly, to unicode_convert(). Each call to
       unicode_convert() results in the output function getting invoked, zero or more times, with each
       successive part of the converted text. Finally, unicode_convert_deinit() stops the conversion and
       deallocates the conversion handle.

       It's possible that a call to unicode_convert_deinit() results in some additional calls to the output
       function, passing the remaining, final parts, of the converted text, before unicode_convert_deinit()
       deallocates the handle, and returns.

       The output function should return 0 normally. A non-0 return indicates n error condition.
       unicode_convert_deinit() returns non-zero if any previous invocation of the output function returned
       non-zero (this includes any invocations of the output function resulting from this call, or prior
       unicode_convert() calls), or 0 if all invocations of the output function returned 0.

       If the errptr is not NULL, *errptr gets set to non-zero if there were any conversion errors -- if there
       was any text that could not be converted to the destination character text.

       unicode_convert() also returns non-zero if it calls the output function and it returns non-zero, however
       the conversion handle remains allocated, so unicode_convert_deinit() must still be called, to clean that
       up.

   Collectingconvertedtextintoabuffer
       Call unicode_convert_tocbuf_init() instead of unicode_convert_init(), then call unicode_convert() and
       unicode_convert_deinit() normally. The parameters to unicode_convert_init() specify the source and the
       destination character sets.  unicode_convert_tocbuf_toutf8_init() is just an alias that specifies UTF-8
       as the destination character set.  unicode_convert_tocbuf_fromutf8_init() is just an alias that specifies
       UTF-8 as the source character st.

       These functions supply an output function that collects the converted text into a malloc()ed buffer. If
       unicode_convert_deinit() returns 0, *cbufptr_ret gets initialized to a malloc()ed buffer, and the number
       of converted characters, the size of the malloc()ed buffer, get placed into *cbufsize_ret.

           Note

           If the converted string is an empty string, *cbufsize_ret gets set to 0, but *cbufptr_ret still gets
           initialized (to a dummy malloced buffer).

       A non-zero nullterminate places a trailing \0 character after the converted string (this is included in
       *cbufsize_ret).

   Convertingbetweencharactersetsandunicodeunicode_convert_tou_init() converts character text into a char32_t buffer. It works just like
       unicode_convert_tocbuf_init(), except that only the source character set gets specified and the output
       buffer is a char32_t buffer.  nullterminate terminates the converted unicode characters with a U+0000.

       unicode_convert_fromu_init() converts char32_ts to the output character set, and also works like
       unicode_convert_tocbuf_init(). Additionally, in this case, unicode_convert_uc() works just like
       unicode_convert() except that the input sequence is a char32_t sequence, and the count parameter is th
       enumber of unicode characters.

   One-shotconversionsunicode_convert_toutf8() converts the specified text in the specified text into a UTF-8 string, returning
       a malloced buffer. If error is not NULL, even if unicode_convert_toutf8() returns a non NULL value *error
       gets set to a non-zero value if a character conversion error has occurred, and some characters could not
       be converted.

       unicode_convert_fromutf8() does a similar conversion from UTF-8 text to the specified character set.

       unicode_convert_tobuf() does a similar conversion between two different character sets.

       unicode_convert_tou_tobuf() calls unicode_convert_tou_init(), feeds the character string through
       unicode_convert(), then calls unicode_convert_deinit(). If this function returns 0, *uc and *ucsize are
       set to a malloced buffer+size holding the unicode char array.

       unicode_convert_fromu_tobuf() calls unicode_convert_fromu_init(), feeds the unicode array through
       unicode_convert_uc(), then calls unicode_convert_deinit(). If this function returns 0, *c and *csize are
       set to a malloced buffer+size holding the char array.

Name

       unicode_u_ucs4_native, unicode_u_ucs2_native, unicode_convert_init, unicode_convert,
       unicode_convert_deinit, unicode_convert_tocbuf_init, unicode_convert_tou_init,
       unicode_convert_fromu_init, unicode_convert_uc, unicode_convert_tocbuf_toutf8_init,
       unicode_convert_tocbuf_fromutf8_init, unicode_convert_toutf8, unicode_convert_fromutf8,
       unicode_convert_tobuf, unicode_convert_tou_tobuf, unicode_convert_fromu_tobuf - unicode character set
       conversion

See Also

courier-unicode(7), unicode_convert_tocase(3), unicode_default_chset(3).

Synopsis

#include<courier-unicode.h>externconstcharunicode_u_ucs4_native[];externconstcharunicode_u_ucs2_native[];unicode_convert_handle_tunicode_convert_init(constchar*src_chset,constchar*dst_chset,void*cb_arg);intunicode_convert(unicode_convert_handle_thandle,constchar*text,size_tcnt);intunicode_convert_deinit(unicode_convert_handle_thandle,int*errptr);unicode_convert_handle_tunicode_convert_tocbuf_init(constchar*src_chset,constchar*dst_chset,char**cbufptr_ret,size_t*cbufsize_ret,intnullterminate);unicode_convert_handle_tunicode_convert_tocbuf_toutf8_init(constchar*src_chset,char**cbufptr_ret,size_t*cbufsize_ret,intnullterminate);unicode_convert_handle_tunicode_convert_tocbuf_fromutf8_init(constchar*dst_chset,char**cbufptr_ret,size_t*cbufsize_ret,intnullterminate);unicode_convert_handle_tunicode_convert_tou_init(constchar*src_chset,char32_t**ucptr_ret,size_t*ucsize_ret,intnullterminate);unicode_convert_handle_tunicode_convert_fromu_init(constchar*dst_chset,char**cbufptr_ret,size_t*cbufsize_ret,intnullterminate);intunicode_convert_uc(unicode_convert_handle_thandle,constchar32_t*text,size_tcnt);char*unicode_convert_toutf8(constchar*text,constchar*charset,int*error);char*unicode_convert_fromutf8(constchar*text,constchar*charset,int*error);char*unicode_convert_tobuf(constchar*text,constchar*charset,constchar*dstcharset,int*error);intunicode_convert_toubuf(constchar*text,size_ttext_l,constchar*charset,char32_t**uc,size_t*ucsize,int*error);intunicode_convert_fromu_tobuf(constchar32_t*utext,size_tutext_l,constchar*charset,char**c,size_t*csize,int*error);

See Also