The MinGW.OSDN Windows System Libraries. Formerly designated as "MinGW.org Windows System Libraries", this encapsulates the "mingwrt" C runtime library extensions, and the "w32api" 32-bit MS-Windows API libraries.
Please note that this project no longer owns the "MinGW.org" domain name; any software which may be distributed from that domain is NOT supported by this project.
Revision | fc451f9b494dd0b6bafb17ba9e35efd9a9d815e7 (tree) |
---|---|
Zeit | 2020-04-08 03:55:12 |
Autor | Keith Marshall <keith@user...> |
Commiter | Keith Marshall |
Document MinGW MBCS/wide character conversion functions.
@@ -1,3 +1,11 @@ | ||
1 | +2020-04-07 Keith Marshall <keith@users.osdn.me> | |
2 | + | |
3 | + Document MinGW MBCS/wide character conversion functions. | |
4 | + | |
5 | + * man/btowc.3.man man/mbrlen.3.man man/mbrtowc.3.man | |
6 | + * man/mbsinit.3.man man/mbsrtowcs.3.man man/wcrtomb.3.man | |
7 | + * man/wcsrtombs.3.man man/wctob.3.man: New files. | |
8 | + | |
1 | 9 | 2020-04-02 Keith Marshall <keith@users.osdn.me> |
2 | 10 | |
3 | 11 | Handle wcsrtombs() initial surrogate completion. |
@@ -0,0 +1,169 @@ | ||
1 | +.\" vim: ft=nroff | |
2 | +.TH %PAGEREF% MinGW "MinGW Programmer's Reference Manual" | |
3 | +. | |
4 | +.SH NAME | |
5 | +.B \%btowc | |
6 | +\- convert a single byte to a wide character | |
7 | +. | |
8 | +. | |
9 | +.SH SYNOPSIS | |
10 | +.B #include | |
11 | +.RB < stdio.h > | |
12 | +.br | |
13 | +.B #include | |
14 | +.RB < wchar.h > | |
15 | +.PP | |
16 | +.B wint_t btowc( int | |
17 | +.I c | |
18 | +.B ); | |
19 | +. | |
20 | +.IP \& -4n | |
21 | +Feature Test Macro Requirements for libmingwex: | |
22 | +.PP | |
23 | +.BR \%__MSVCRT_VERSION__ : | |
24 | +since \%mingwrt\(hy5.3, | |
25 | +if this feature test macro is | |
26 | +.IR defined , | |
27 | +with a value of | |
28 | +.I at least | |
29 | +.IR \%0x0800 , | |
30 | +(corresponding to the symbolic constant, | |
31 | +.BR \%__MSCVR80_DLL , | |
32 | +and thus declaring intent to link with \%MSVCR80.DLL, | |
33 | +or any later version of \%Microsoft\(aqs \%non\(hyfree runtime library, | |
34 | +instead of with \%MSVCRT.DLL), | |
35 | +calls to | |
36 | +.BR \%btowc () | |
37 | +will be directed to the implementation thereof, | |
38 | +within \%Microsoft\(aqs runtime DLL. | |
39 | +. | |
40 | +.PP | |
41 | +.BR \%_ISOC99_SOURCE , | |
42 | +.BR \%_ISOC11_SOURCE : | |
43 | +since \%mingwrt\(hy5.3.1, | |
44 | +when linking with \%MSVCRT.DLL, | |
45 | +or when | |
46 | +.B \%__MSVCRT_VERSION__ | |
47 | +is either | |
48 | +.IR undefined , | |
49 | +or is | |
50 | +.I defined | |
51 | +with any value which is | |
52 | +.I less than | |
53 | +.IR \%0x0800 , | |
54 | +(thus denying intent to link with \%MSVCR80.DLL, | |
55 | +or any later \%non\(hyfree version of Microsoft\(aqs runtime library), | |
56 | +.I explicitly | |
57 | +defining either of these feature test macros | |
58 | +will cause any call to | |
59 | +.BR \%btowc () | |
60 | +to be directed to the | |
61 | +.I \%libmingwex | |
62 | +implementation; | |
63 | +if neither macro is defined, | |
64 | +calls to | |
65 | +.BR \%btowc () | |
66 | +will be directed to Microsoft\(aqs runtime implementation, | |
67 | +if it is available, | |
68 | +otherwise falling back to the | |
69 | +.I \%libmingwex | |
70 | +implementation. | |
71 | +. | |
72 | +.PP | |
73 | +Prior to \%mingwrt\(hy5.3, | |
74 | +none of the above feature test macros have any effect on | |
75 | +.BR \%btowc (); | |
76 | +all calls will be directed to the | |
77 | +.I \%libmingwex | |
78 | +implementation. | |
79 | +. | |
80 | +. | |
81 | +.SH DESCRIPTION | |
82 | +If | |
83 | +.I c | |
84 | +is not | |
85 | +.BR EOF , | |
86 | +the | |
87 | +.BR \%btowc () | |
88 | +function attempts to interpret | |
89 | +.I c | |
90 | +as a multibyte character sequence of length | |
91 | +.IR one ; | |
92 | +if the single byte evaluated represents a complete multibyte character, | |
93 | +in the codeset which is associated with the | |
94 | +.B \%LC_CTYPE | |
95 | +category of the active process locale, | |
96 | +.BR \%btowc () | |
97 | +converts it to, | |
98 | +and returns, | |
99 | +its equivalent wide character value. | |
100 | +. | |
101 | +. | |
102 | +.SH RETURN VALUE | |
103 | +If | |
104 | +.I c | |
105 | +is | |
106 | +.BR EOF , | |
107 | +or if it does not represent a complete multibyte | |
108 | +character sequence of length | |
109 | +.IR one , | |
110 | +.BR \%btowc () | |
111 | +returns | |
112 | +.BR WEOF ; | |
113 | +otherwise the conversion of the single byte character, | |
114 | +to its equivalent wide character value, | |
115 | +is returned. | |
116 | +. | |
117 | +. | |
118 | +.SH ERROR CONDITIONS | |
119 | +No error conditions are defined. | |
120 | +. | |
121 | +. | |
122 | +.SH STANDARDS CONFORMANCE | |
123 | +Except to the extent that it may be affected by limitations | |
124 | +of the underlying \%MS\(hyWindows API, | |
125 | +the | |
126 | +.I \%libmingwex | |
127 | +implementation of | |
128 | +.BR \%btowc () | |
129 | +conforms generally to | |
130 | +.BR \%ISO\(hyC99 , | |
131 | +.BR \%POSIX.1\(hy2001 , | |
132 | +and | |
133 | +.BR \%POSIX.1\(hy2008 ; | |
134 | +(prior to \%mingwrt\-5.3, | |
135 | +and in those cases where calls may be delegated | |
136 | +to a Microsoft runtime DLL implementation, | |
137 | +this level of conformity may not be achieved). | |
138 | +. | |
139 | +. | |
140 | +.\"SH EXAMPLE | |
141 | +. | |
142 | +. | |
143 | +.SH CAVEATS AND BUGS | |
144 | +Use of the | |
145 | +.BR \%btowc () | |
146 | +function is | |
147 | +.IR discouraged ; | |
148 | +it serves no purpose which may not be better served by the | |
149 | +.BR \%mbrtowc (3) | |
150 | +function, | |
151 | +which should be considered as a preferred alternative. | |
152 | +. | |
153 | +. | |
154 | +.SH SEE ALSO | |
155 | +.BR mbrtowc (3) | |
156 | +. | |
157 | +. | |
158 | +.SH AUTHOR | |
159 | +This manpage was written by \%Keith\ Marshall, | |
160 | +\%<keith@users.osdn.me>, | |
161 | +to document the | |
162 | +.BR \%btowc () | |
163 | +function as it has been implemented for the MinGW.org Project. | |
164 | +It may be copied, modified and redistributed, | |
165 | +without restriction of copyright, | |
166 | +provided this acknowledgement of contribution by | |
167 | +the original author remains in place. | |
168 | +. | |
169 | +.\" EOF |
@@ -0,0 +1,377 @@ | ||
1 | +.\" vim: ft=nroff | |
2 | +.TH %PAGEREF% MinGW "MinGW Programmer's Reference Manual" | |
3 | +. | |
4 | +.SH NAME | |
5 | +.B mbrlen | |
6 | +\- determine the number of bytes in a multibyte character | |
7 | +. | |
8 | +. | |
9 | +.SH SYNOPSIS | |
10 | +.B #include | |
11 | +.RB < wchar.h > | |
12 | +.PP | |
13 | +.B size_t mbrlen( const char | |
14 | +.BI * s , | |
15 | +.B size_t | |
16 | +.IB n , | |
17 | +.B mbstate_t | |
18 | +.BI * ps | |
19 | +.B ); | |
20 | +. | |
21 | +. | |
22 | +.IP \& -4n | |
23 | +Feature Test Macro Requirements for libmingwex: | |
24 | +.PP | |
25 | +.BR \%__MSVCRT_VERSION__ : | |
26 | +since \%mingwrt\(hy5.3, | |
27 | +if this feature test macro is | |
28 | +.IR defined , | |
29 | +with a value of | |
30 | +.I at least | |
31 | +.IR \%0x0800 , | |
32 | +(corresponding to the symbolic constant, | |
33 | +.BR \%__MSCVR80_DLL , | |
34 | +and thus declaring intent to link with \%MSVCR80.DLL, | |
35 | +or any later version of \%Microsoft\(aqs \%non\(hyfree runtime library, | |
36 | +instead of with \%MSVCRT.DLL), | |
37 | +calls to | |
38 | +.BR \%mbrlen () | |
39 | +will be directed to the implementation thereof, | |
40 | +within \%Microsoft\(aqs runtime DLL. | |
41 | +. | |
42 | +.PP | |
43 | +.BR \%_ISOC99_SOURCE , | |
44 | +.BR \%_ISOC11_SOURCE : | |
45 | +since \%mingwrt\(hy5.3.1, | |
46 | +when linking with \%MSVCRT.DLL, | |
47 | +or when | |
48 | +.B \%__MSVCRT_VERSION__ | |
49 | +is either | |
50 | +.IR undefined , | |
51 | +or is | |
52 | +.I defined | |
53 | +with any value which is | |
54 | +.I less than | |
55 | +.IR \%0x0800 , | |
56 | +(thus denying intent to link with \%MSVCR80.DLL, | |
57 | +or any later \%non\(hyfree version of Microsoft\(aqs runtime library), | |
58 | +.I explicitly | |
59 | +defining either of these feature test macros | |
60 | +will cause any call to | |
61 | +.BR \%mbrlen () | |
62 | +to be directed to the | |
63 | +.I \%libmingwex | |
64 | +implementation; | |
65 | +if neither macro is defined, | |
66 | +calls to | |
67 | +.BR \%mbrlen () | |
68 | +will be directed to Microsoft\(aqs runtime implementation, | |
69 | +if it is available, | |
70 | +otherwise falling back to the | |
71 | +.I \%libmingwex | |
72 | +implementation. | |
73 | +. | |
74 | +.PP | |
75 | +Prior to \%mingwrt\(hy5.3, | |
76 | +none of the above feature test macros have any effect on | |
77 | +.BR \%mbrlen (); | |
78 | +all calls will be directed to the | |
79 | +.I \%libmingwex | |
80 | +implementation. | |
81 | +. | |
82 | +. | |
83 | +.SH DESCRIPTION | |
84 | +The | |
85 | +.BR \%mbrlen () | |
86 | +function inspects the sequence of bytes, | |
87 | +starting at | |
88 | +.IR s , | |
89 | +up to a maximum of | |
90 | +.I n | |
91 | +bytes, | |
92 | +to determine the number of bytes required to complete | |
93 | +the next multibyte code point, | |
94 | +commencing from the conversion state specified in | |
95 | +.IR *ps , | |
96 | +(which is then updated). | |
97 | +. | |
98 | +.PP | |
99 | +The sequence of bytes, | |
100 | +pointed to by | |
101 | +.IR s , | |
102 | +is interpreted as a multibyte character sequence | |
103 | +in the codeset which is associated with the | |
104 | +.B \%LC_CTYPE | |
105 | +category of the active process locale. | |
106 | +. | |
107 | +.PP | |
108 | +If | |
109 | +.I ps | |
110 | +is specified as a NULL pointer, | |
111 | +.BR \%mbrlen () | |
112 | +will track conversion state using an internal | |
113 | +.B \%mbstate_t | |
114 | +object reference, | |
115 | +which is private within the | |
116 | +.BR \%mbrlen () | |
117 | +process address space; | |
118 | +at process \%start\(hyup, | |
119 | +this internal | |
120 | +.B \%mbstate_t | |
121 | +object is initialized to represent | |
122 | +the initial conversion state. | |
123 | +. | |
124 | +. | |
125 | +.SH RETURN VALUE | |
126 | +If the multibyte sequence, | |
127 | +completed by | |
128 | +.I n | |
129 | +or fewer bytes, | |
130 | +does not represent the NUL code point, | |
131 | +then | |
132 | +.BR \%mbrlen () | |
133 | +returns the number of bytes which are actually required | |
134 | +to complete the sequence, | |
135 | +(a number between 1 and | |
136 | +.IR n , | |
137 | +inclusive), | |
138 | +and the conversion state, | |
139 | +as specified in | |
140 | +.IR *ps , | |
141 | +is reset to the initial state. | |
142 | +. | |
143 | +.PP | |
144 | +On the other hand, | |
145 | +if the completed multibyte sequence | |
146 | +.I does | |
147 | +represent the NUL code point, | |
148 | +then | |
149 | +.BR \%mbrlen () | |
150 | +returns zero, | |
151 | +and the conversion state, | |
152 | +as specified in | |
153 | +.IR *ps , | |
154 | +is reset to the initial state. | |
155 | +. | |
156 | +.PP | |
157 | +If | |
158 | +.I n | |
159 | +is less than the effective | |
160 | +.B \%MB_CUR_MAX | |
161 | +for the active process locale, | |
162 | +and | |
163 | +.I n | |
164 | +bytes is insufficient to complete a multibyte character, | |
165 | +then | |
166 | +.I *ps | |
167 | +is updated to represent a new partially completed encoding state, | |
168 | +and | |
169 | +.BR \%mbrlen () | |
170 | +returns | |
171 | +.IR \%(size_t)(\-2) . | |
172 | +Conversely, | |
173 | +if | |
174 | +.I n | |
175 | +is equal to, | |
176 | +or greater than | |
177 | +.BR \%MB_CUR_MAX , | |
178 | +this return condition can arise, | |
179 | +only if the multibyte encoding sequence includes | |
180 | +redundant shift states; | |
181 | +since shift states are not used, | |
182 | +this cannot occur in any \%MS\(hyWindows | |
183 | +multibyte character set. | |
184 | +. | |
185 | +. | |
186 | +.SH ERROR CONDITIONS | |
187 | +If the sequence of | |
188 | +.I n | |
189 | +or fewer bytes, | |
190 | +pointed to by | |
191 | +.IR s , | |
192 | +extends any pending encoding state recorded within | |
193 | +.IR *ps , | |
194 | +to at least | |
195 | +.B \%MB_CUR_MAX | |
196 | +bytes, | |
197 | +and the resulting sequence does not represent | |
198 | +a valid multibyte character, | |
199 | +then | |
200 | +.I \%errno | |
201 | +is set to | |
202 | +.BR \%EILSEQ , | |
203 | +and | |
204 | +.BR \%mbrlen () | |
205 | +returns | |
206 | +.IR \%(size_t)(\-1) . | |
207 | +. | |
208 | +.PP | |
209 | +If, | |
210 | +on entry to | |
211 | +.BR \%mbrlen (), | |
212 | +the conversion state represented by | |
213 | +.I *ps | |
214 | +is deemed to be | |
215 | +.IR invalid , | |
216 | +.I \%errno | |
217 | +is set to | |
218 | +.BR \%EINVAL , | |
219 | +and | |
220 | +.BR \%mbrlen () | |
221 | +returns | |
222 | +.IR \%(size_t)(\-1) ; | |
223 | +the conversion state may be deemed to be invalid if | |
224 | +it contains any sequence of bytes which does not match | |
225 | +a valid initial sequence from a multibyte character | |
226 | +representation within the currently active codeset, | |
227 | +if it can be interpreted as a complete multibyte character, | |
228 | +.I without | |
229 | +the addition of any further bytes from | |
230 | +.IR s , | |
231 | +or if it represents a | |
232 | +.I surrogate\ pair | |
233 | +conversion, | |
234 | +resulting from a preceding call to the | |
235 | +.BR \%mbrtowc (3) | |
236 | +function, | |
237 | +and from which the | |
238 | +.I low\ surrogate | |
239 | +has yet to be retrieved. | |
240 | +. | |
241 | +. | |
242 | +.SH STANDARDS CONFORMANCE | |
243 | +Except to the extent that it may be affected by limitations | |
244 | +of the underlying \%MS\(hyWindows API, | |
245 | +the | |
246 | +.I \%libmingwex | |
247 | +implementation of | |
248 | +.BR \%mbrlen () | |
249 | +conforms generally to | |
250 | +.BR \%ISO\(hyC99 , | |
251 | +.BR \%POSIX.1\(hy2001 , | |
252 | +and | |
253 | +.BR \%POSIX.1\(hy2008 ; | |
254 | +(prior to \%mingwrt\-5.3 , | |
255 | +and in those cases where calls may be delegated | |
256 | +to a Microsoft runtime DLL implementation, | |
257 | +this level of conformity may not be achieved). | |
258 | +. | |
259 | +.PP | |
260 | +The feature whereby | |
261 | +.I \%errno | |
262 | +is set to | |
263 | +.BR EINVAL , | |
264 | +when | |
265 | +.I *ps | |
266 | +is found to be invalid, | |
267 | +is a | |
268 | +.B POSIX.1 | |
269 | +conforming extension to | |
270 | +.BR \%ISO\(hyC99 . | |
271 | +. | |
272 | +. | |
273 | +.\"SH EXAMPLE | |
274 | +. | |
275 | +. | |
276 | +.SH CAVEATS AND BUGS | |
277 | +If | |
278 | +.BR \%mbrlen () | |
279 | +is called with a NULL pointer for | |
280 | +.IR s , | |
281 | +the behaviour is undefined. | |
282 | +. | |
283 | +.PP | |
284 | +Due to a documented limitation of Microsoft\(aqs | |
285 | +.BR \%setlocale () | |
286 | +function implementation, | |
287 | +it is not possible to directly select an active locale, | |
288 | +in which the codeset is represented by any multibyte | |
289 | +character sequence with an effective | |
290 | +.B \%MB_CUR_MAX | |
291 | +of more than two bytes. | |
292 | +Prior to \%mingwrt\(hy5.3, | |
293 | +this limitation precludes the use of | |
294 | +.BR \%mbrlen () | |
295 | +to interpret any codeset with | |
296 | +.B \%MB_CUR_MAX | |
297 | +greater than two bytes, | |
298 | +(such as | |
299 | +.BR \%UTF\(hy8 ). | |
300 | +From \%mingwrt\(hy5.3 onward, | |
301 | +the MinGW.org implementation of | |
302 | +.BR \%mbrlen () | |
303 | +mitigates this limitation by assignment of the codeset | |
304 | +from the | |
305 | +.B \%LC_CTYPE | |
306 | +environment variable, | |
307 | +provided the system default has been previously activated | |
308 | +for the | |
309 | +.B \%LC_CTYPE | |
310 | +locale category; | |
311 | +e.g.\ execution of: | |
312 | +.PP | |
313 | +.RS 4 | |
314 | +.EX | |
315 | +#define _ISOC99_SOURCE | |
316 | + | |
317 | +#include <stdio.h> | |
318 | +#include <stdlib.h> | |
319 | +#include <locale.h> | |
320 | +#include <limits.h> | |
321 | +#include <wchar.h> | |
322 | + | |
323 | +int main() | |
324 | +{ | |
325 | + setlocale( LC_CTYPE, "" ); | |
326 | + putenv( "LC_CTYPE=en_GB.65001" ); | |
327 | + printf( "%u bytes\en", | |
328 | + mbrlen( "\eU0001d10b", MB_LEN_MAX, NULL ) | |
329 | + ); | |
330 | + return 0; | |
331 | +} | |
332 | +.EE | |
333 | +.RE | |
334 | +.PP | |
335 | +will interpret the string \fC\%"\eU0001d10b"\fP as a \%four\(hybyte | |
336 | +.B \%UTF\(hy8 | |
337 | +encoding sequence, | |
338 | +(which represents a single code point), | |
339 | +and print the result as \fC4\fP\ \fC\%bytes\fP. | |
340 | +. | |
341 | +.PP | |
342 | +Please be aware that the underlying \%MS\(hyWindows API, | |
343 | +which is used to interpret the multibyte sequence, | |
344 | +offers no readily accessible mechanism to discriminate | |
345 | +between incomplete and invalid sequences; | |
346 | +thus, | |
347 | +if | |
348 | +.I n | |
349 | +is less than the effective | |
350 | +.B \%MB_CUR_MAX | |
351 | +for the active codeset, | |
352 | +this | |
353 | +.BR \%mbrlen () | |
354 | +implementation may return | |
355 | +.IR \%(size_t)(\-2) , | |
356 | +indicating an incomplete sequence, | |
357 | +even in cases where there are no additional bytes | |
358 | +which could be appended, | |
359 | +to complete a valid encoding sequence. | |
360 | +. | |
361 | +. | |
362 | +.SH SEE ALSO | |
363 | +.BR mbrtowc (3) | |
364 | +. | |
365 | +. | |
366 | +.SH AUTHOR | |
367 | +This manpage was written by \%Keith\ Marshall, | |
368 | +\%<keith@users.osdn.me>, | |
369 | +to document the | |
370 | +.BR \%mbrlen () | |
371 | +function as it has been implemented for the MinGW.org Project. | |
372 | +It may be copied, modified and redistributed, | |
373 | +without restriction of copyright, | |
374 | +provided this acknowledgement of contribution by | |
375 | +the original author remains in place. | |
376 | +. | |
377 | +.\" EOF |
@@ -0,0 +1,681 @@ | ||
1 | +.\" vim: ft=nroff | |
2 | +.TH %PAGEREF% MinGW "MinGW Programmer's Reference Manual" | |
3 | +. | |
4 | +.SH NAME | |
5 | +.B mbrtowc | |
6 | +\- convert from multibyte to wide character encoding | |
7 | +. | |
8 | +. | |
9 | +.SH SYNOPSIS | |
10 | +.B #include | |
11 | +.RB < wchar.h > | |
12 | +.PP | |
13 | +.B size_t mbrtowc( wchar_t | |
14 | +.BI * pwc , | |
15 | +.B const char | |
16 | +.BI * s , | |
17 | +.B size_t | |
18 | +.IB n , | |
19 | +.B mbstate_t | |
20 | +.BI * ps | |
21 | +.B ); | |
22 | +. | |
23 | +.IP \& -4n | |
24 | +Feature Test Macro Requirements for libmingwex: | |
25 | +.PP | |
26 | +.BR \%__MSVCRT_VERSION__ : | |
27 | +since \%mingwrt\(hy5.3, | |
28 | +if this feature test macro is | |
29 | +.IR defined , | |
30 | +with a value of | |
31 | +.I at least | |
32 | +.IR 0x0800 , | |
33 | +(corresponding to the symbolic constant, | |
34 | +.BR \%__MSCVR80_DLL , | |
35 | +and thus declaring intent to link with \%MSVCR80.DLL, | |
36 | +or any later version of \%Microsoft\(aqs \%non\(hyfree runtime library, | |
37 | +instead of with \%MSVCRT.DLL), | |
38 | +calls to | |
39 | +.BR mbrtowc () | |
40 | +will be directed to the implementation thereof, | |
41 | +within \%Microsoft\(aqs runtime DLL. | |
42 | +. | |
43 | +.PP | |
44 | +.BR \%_ISOC99_SOURCE , | |
45 | +.BR \%_ISOC11_SOURCE : | |
46 | +since \%mingwrt\(hy5.3.1, | |
47 | +when linking with \%MSVCRT.DLL, | |
48 | +or when | |
49 | +.B \%__MSVCRT_VERSION__ | |
50 | +is either | |
51 | +.IR undefined , | |
52 | +or is | |
53 | +.I defined | |
54 | +with any value which is | |
55 | +.I less than | |
56 | +.IR 0x0800 , | |
57 | +(thus denying intent to link with \%MSVCR80.DLL, | |
58 | +or any later \%non\(hyfree version of Microsoft\(aqs runtime library), | |
59 | +.I explicitly | |
60 | +defining either of these feature test macros | |
61 | +will cause any call to | |
62 | +.BR \%mbrtowc () | |
63 | +to be directed to the | |
64 | +.I \%libmingwex | |
65 | +implementation; | |
66 | +if neither macro is defined, | |
67 | +calls to | |
68 | +.BR \%mbrtowc () | |
69 | +will be directed to Microsoft\(aqs runtime implementation, | |
70 | +if it is available, | |
71 | +otherwise falling back to the | |
72 | +.I \%libmingwex | |
73 | +implementation. | |
74 | +. | |
75 | +.PP | |
76 | +Prior to \%mingwrt\(hy5.3, | |
77 | +none of the above feature test macros have any effect on | |
78 | +.BR \%mbrtowc (); | |
79 | +all calls will be directed to the | |
80 | +.I \%libmingwex | |
81 | +implementation. | |
82 | +. | |
83 | +. | |
84 | +.SH DESCRIPTION | |
85 | +If | |
86 | +.I s | |
87 | +is a NULL pointer, | |
88 | +the | |
89 | +.IR *pwc , | |
90 | +and the | |
91 | +.I n | |
92 | +arguments are ignored, | |
93 | +and the call to | |
94 | +.BR \%mbrtowc () | |
95 | +function is interpreted as if invoked as | |
96 | +.PP | |
97 | +.RS 4n | |
98 | +.EX | |
99 | +mbrtowc( NULL, "", 1, ps ); | |
100 | +.EE | |
101 | +.RE | |
102 | +. | |
103 | +.PP | |
104 | +Otherwise, | |
105 | +if | |
106 | +.I s | |
107 | +is not a NULL pointer, | |
108 | +the | |
109 | +.BR \%mbrtowc () | |
110 | +function inspects the sequence of bytes, | |
111 | +starting at | |
112 | +.IR s , | |
113 | +up to a maximum of | |
114 | +.I n | |
115 | +bytes, | |
116 | +to determine the number of bytes required to complete | |
117 | +the next multibyte code point, | |
118 | +commencing from the conversion state specified in | |
119 | +.IR *ps , | |
120 | +(which is then updated). | |
121 | +Then, | |
122 | +if | |
123 | +.I *pwc | |
124 | +is not a NULL pointer, | |
125 | +and | |
126 | +.I n | |
127 | +or fewer bytes is sufficient to complete a single | |
128 | +multibyte character, | |
129 | +the single | |
130 | +.B \%wchar_t | |
131 | +wide character conversion of that multibyte character | |
132 | +is stored at | |
133 | +.IR *pwc . | |
134 | +. | |
135 | +.PP | |
136 | +The sequence of bytes, | |
137 | +pointed to by | |
138 | +.IR s , | |
139 | +is interpreted as a multibyte character sequence | |
140 | +in the codeset which is associated with the | |
141 | +.B \%LC_CTYPE | |
142 | +category of the active process locale. | |
143 | +. | |
144 | +.PP | |
145 | +If | |
146 | +.I ps | |
147 | +is specified as a NULL pointer, | |
148 | +.BR \%mbrtowc () | |
149 | +will track conversion state using an internal | |
150 | +.B \%mbstate_t | |
151 | +object reference, | |
152 | +which is private within the | |
153 | +.BR \%mbrtowc () | |
154 | +process address space; | |
155 | +at process \%start\(hyup, | |
156 | +this internal | |
157 | +.B \%mbstate_t | |
158 | +object is initialized to represent | |
159 | +the initial conversion state. | |
160 | +. | |
161 | +.PP | |
162 | +In the special case, | |
163 | +where the conversion of a completed multibyte character | |
164 | +must be represented as a | |
165 | +.B \%UTF\(hy16LE | |
166 | +.IR surrogate\ pair , | |
167 | +and | |
168 | +.I *pwc | |
169 | +is not a NULL pointer, | |
170 | +only the | |
171 | +.I high\ surrogate | |
172 | +will be stored at | |
173 | +.IR *pwc ; | |
174 | +please refer to the section | |
175 | +.B CAVEATS AND | |
176 | +.BR BUGS , | |
177 | +below, | |
178 | +for advice on retrieval of the | |
179 | +.IR low\ surrogate . | |
180 | +. | |
181 | +. | |
182 | +.SH RETURN VALUE | |
183 | +If the multibyte sequence, | |
184 | +completed by | |
185 | +.I n | |
186 | +or fewer bytes, | |
187 | +does not represent the NUL code point, | |
188 | +then | |
189 | +.BR \%mbrtowc () | |
190 | +returns the number of bytes which are actually required | |
191 | +to complete the sequence, | |
192 | +(a number between 1 and | |
193 | +.IR n , | |
194 | +inclusive), | |
195 | +and the conversion state, | |
196 | +as specified in | |
197 | +.IR *ps , | |
198 | +is reset to the initial state; | |
199 | +if | |
200 | +.I pwc | |
201 | +is not a NULL pointer, | |
202 | +the wide character conversion of the completed | |
203 | +multibyte character is stored at | |
204 | +.IR *pwc . | |
205 | +. | |
206 | +.PP | |
207 | +On the other hand, | |
208 | +if the completed multibyte sequence | |
209 | +.I does | |
210 | +represent the NUL code point, | |
211 | +then | |
212 | +.BR \%mbrtowc () | |
213 | +returns zero, | |
214 | +and the conversion state, | |
215 | +as specified in | |
216 | +.IR *ps , | |
217 | +is reset to the initial state; | |
218 | +if | |
219 | +.I pwc | |
220 | +is not a NULL pointer, | |
221 | +the NUL wide character is stored at | |
222 | +.IR *pwc . | |
223 | +. | |
224 | +.PP | |
225 | +If | |
226 | +.I n | |
227 | +is less than the effective | |
228 | +.B \%MB_CUR_MAX | |
229 | +for the active process locale, | |
230 | +and | |
231 | +.I n | |
232 | +bytes is insufficient to complete a multibyte character, | |
233 | +then | |
234 | +.I *ps | |
235 | +is updated to represent a new partially completed encoding state, | |
236 | +(no wide character conversion is stored), | |
237 | +and | |
238 | +.BR \%mbrtowc () | |
239 | +returns | |
240 | +.IR \%(size_t)(\-2) . | |
241 | +(If | |
242 | +.I n | |
243 | +is equal to, | |
244 | +or greater than | |
245 | +.BR \%MB_CUR_MAX , | |
246 | +this return condition can arise, | |
247 | +only if the multibyte encoding sequence includes | |
248 | +redundant shift states; | |
249 | +since shift states are not used, | |
250 | +this cannot occur in any \%MS\(hyWindows | |
251 | +multibyte character set). | |
252 | +. | |
253 | +. | |
254 | +.SH ERROR CONDITIONS | |
255 | +If the sequence of | |
256 | +.I n | |
257 | +or fewer bytes, | |
258 | +pointed to by | |
259 | +.IR s , | |
260 | +extends any pending encoding state recorded within | |
261 | +.IR *ps , | |
262 | +to at least | |
263 | +.B \%MB_CUR_MAX | |
264 | +bytes, | |
265 | +and the resulting sequence does not represent | |
266 | +a valid multibyte character, | |
267 | +then | |
268 | +.I \%errno | |
269 | +is set to | |
270 | +.BR \%EILSEQ , | |
271 | +no wide character conversion is stored, | |
272 | +and | |
273 | +.BR \%mbrtowc () | |
274 | +returns | |
275 | +.IR \%(size_t)(\-1) . | |
276 | +. | |
277 | +.PP | |
278 | +If, | |
279 | +on entry to | |
280 | +.BR \%mbrtowc (), | |
281 | +the conversion state represented by | |
282 | +.I *ps | |
283 | +is deemed to be | |
284 | +.IR invalid , | |
285 | +.I \%errno | |
286 | +is set to | |
287 | +.BR \%EINVAL , | |
288 | +and | |
289 | +.BR \%mbrtowc () | |
290 | +returns | |
291 | +.IR \%(size_t)(\-1) ; | |
292 | +the conversion state may be deemed to be invalid if | |
293 | +it contains any sequence of bytes which does not match | |
294 | +a valid initial sequence from a multibyte character | |
295 | +representation within the currently active codeset, | |
296 | +if it can be interpreted as a complete multibyte character, | |
297 | +.I without | |
298 | +the addition of any further bytes from | |
299 | +.IR s , | |
300 | +or if it represents a | |
301 | +.I surrogate\ pair | |
302 | +conversion, | |
303 | +resulting from a preceding call to | |
304 | +.BR \%mbrtowc (), | |
305 | +from which the | |
306 | +.I low\ surrogate | |
307 | +has yet to be retrieved, | |
308 | +(and this is not the special case in which | |
309 | +.I n | |
310 | +is specified as | |
311 | +.IR \%zero , | |
312 | +indicating that this call is intended | |
313 | +to retrieve that pending | |
314 | +.IR low\ surrogate ). | |
315 | +. | |
316 | +. | |
317 | +.SH STANDARDS CONFORMANCE | |
318 | +Except in respect of its extended provision for handling of | |
319 | +.IR surrogate\ pairs , | |
320 | +and to the extent that it may be affected by limitations | |
321 | +of the underlying \%MS\(hyWindows API, | |
322 | +the | |
323 | +.I \%libmingwex | |
324 | +implementation of | |
325 | +.BR mbrtowc () | |
326 | +conforms generally to | |
327 | +.BR \%ISO\(hyC99 , | |
328 | +.BR \%POSIX.1\(hy2001 , | |
329 | +and | |
330 | +.BR \%POSIX.1\(hy2008 ; | |
331 | +(prior to \%mingwrt\-5.3, | |
332 | +and in those cases where calls may be delegated | |
333 | +to a Microsoft runtime DLL implementation, | |
334 | +this level of conformity may not be achieved). | |
335 | +. | |
336 | +.PP | |
337 | +The feature whereby | |
338 | +.I \%errno | |
339 | +is set to | |
340 | +.BR EINVAL , | |
341 | +when | |
342 | +.I *ps | |
343 | +is found to be invalid, | |
344 | +is a | |
345 | +.B POSIX.1 | |
346 | +conforming extension to | |
347 | +.BR \%ISO\(hyC99 . | |
348 | +. | |
349 | +. | |
350 | +.\"SH EXAMPLE | |
351 | +. | |
352 | +. | |
353 | +.SH CAVEATS AND BUGS | |
354 | +Due to a documented limitation of Microsoft\(aqs | |
355 | +.BR \%setlocale () | |
356 | +function implementation, | |
357 | +it is not possible to directly select an active locale, | |
358 | +in which the codeset is represented by any multibyte | |
359 | +character sequence with an effective | |
360 | +.B \%MB_CUR_MAX | |
361 | +of more than two bytes. | |
362 | +Prior to \%mingwrt\(hy5.3, | |
363 | +this limitation precludes the use of | |
364 | +.BR \%mbrtowc () | |
365 | +to interpret any codeset with | |
366 | +.B \%MB_CUR_MAX | |
367 | +greater than two bytes, | |
368 | +(such as | |
369 | +.BR \%UTF\(hy8 ). | |
370 | +From \%mingwrt\(hy5.3 onward, | |
371 | +the MinGW.org implementation of | |
372 | +.BR \%mbrtowc () | |
373 | +mitigates this limitation by assignment of the codeset | |
374 | +from the | |
375 | +.B \%LC_CTYPE | |
376 | +environment variable, | |
377 | +provided the system default has been previously activated | |
378 | +for the | |
379 | +.B \%LC_CTYPE | |
380 | +locale category; | |
381 | +e.g.\ execution of: | |
382 | +.PP | |
383 | +.RS 4n | |
384 | +.EX | |
385 | +#include <stdio.h> | |
386 | +#include <stdlib.h> | |
387 | +#include <locale.h> | |
388 | +#include <limits.h> | |
389 | +#include <wchar.h> | |
390 | + | |
391 | +void print_conv( const char * ); | |
392 | + | |
393 | +int main() | |
394 | +{ | |
395 | + setlocale( LC_CTYPE, "" ); | |
396 | + putenv( "LC_CTYPE=en_GB.65001" ); | |
397 | + print_conv( "\eU0001d10b" ); | |
398 | + print_conv( "\eu6c34" ); | |
399 | + return 0; | |
400 | +} | |
401 | + | |
402 | +void print_conv( const char *mbs ) | |
403 | +{ | |
404 | + wchar_t wch; | |
405 | + size_t n = mbrtowc( &wch, mbs, MB_LEN_MAX, NULL ); | |
406 | + if( (int)(n) > 0 ) printf( "%u bytes \-> 0x%04X\en", n, wch ); | |
407 | + else if( n == (size_t)(\-1) ) perror( "mbrtowc" ); | |
408 | +} | |
409 | +.EE | |
410 | +.RE | |
411 | +.PP | |
412 | +will interpret the string \fC"\eU0001d10b"\fP as a \%four\(hybyte | |
413 | +.B \%UTF\(hy8 | |
414 | +encoding sequence, | |
415 | +(which represents a single Unicode code point), | |
416 | +but will fail to interpret the following \fC"\eu6c34"\fP sequence, | |
417 | +(which also represents a valid Unicode code point), | |
418 | +and, | |
419 | +(if | |
420 | +.B stderr | |
421 | +is redirected to | |
422 | +.BR stdout ), | |
423 | +will print the result as: | |
424 | +.PP | |
425 | +.RS 4n | |
426 | +.EX | |
427 | +4 bytes \-> 0xD834 | |
428 | +mbrtowc: Invalid argument | |
429 | +.EE | |
430 | +.RE | |
431 | +.PP | |
432 | +This example illustrates a potentially irreconcilable | |
433 | +deviation of any | |
434 | +.BR \%mbrtowc () | |
435 | +implementation, | |
436 | +on \%MS\(hyWindows, | |
437 | +from the | |
438 | +.B \%ISO\(hyC99 | |
439 | +standard: | |
440 | +due to \%Microsoft\(aqs choice of | |
441 | +.B \%UTF\(hy16LE | |
442 | +as the underlying representation of the | |
443 | +.B \%wchar_t | |
444 | +data type, | |
445 | +it is not possible to satisfy the requirement, | |
446 | +implicit in the | |
447 | +.B \%ISO\(hyC99 | |
448 | +specification for | |
449 | +.BR \%mbrtowc (), | |
450 | +that it should be possible to return the complete representation | |
451 | +of any single representable Unicode code point as a single | |
452 | +.B \%wchar_t | |
453 | +value. | |
454 | +In the case of this example, | |
455 | +whereas the \%4\(hybyte | |
456 | +.B \%UTF\(hy8 | |
457 | +representation of the \fC\%"\eU0001d10b"\fP Unicode code point | |
458 | +.I is | |
459 | +complete, | |
460 | +the \fC\%0xD834\fP | |
461 | +.B \%wchar_t | |
462 | +representation, | |
463 | +as returned by | |
464 | +.BR \%mbrtowc (), | |
465 | +is | |
466 | +.I not | |
467 | +complete; | |
468 | +it represents a | |
469 | +.B \%UTF\(hy16 | |
470 | +.IR high\ surrogate , | |
471 | +which | |
472 | +.I must | |
473 | +be paired with a corresponding | |
474 | +.I low\ surrogate | |
475 | +to complete it, | |
476 | +and, | |
477 | +since | |
478 | +.B \%ISO\(hyC99 | |
479 | +requires that the | |
480 | +.B \%*pwc | |
481 | +argument to | |
482 | +.BR \%mbrtowc () | |
483 | +refers to sufficient storage space to accommodate only | |
484 | +.I one | |
485 | +.B \%wchar_t | |
486 | +value, | |
487 | +it is not possible for | |
488 | +.BR \%mbrtowc () | |
489 | +to | |
490 | +.I safely | |
491 | +return | |
492 | +.I both | |
493 | +the | |
494 | +.IR high\ surrogate , | |
495 | +and its complementary | |
496 | +.IR low\ surrogate , | |
497 | +in a single call. | |
498 | +To mitigate this non\(hyconformance, | |
499 | +from \%mingwrt\(hy5.3 onward, | |
500 | +the \%MinGW implementation of | |
501 | +.BR \%mbrtowc () | |
502 | +supports the following non\(hystandard strategy | |
503 | +for completion of any conversion which requires return of a | |
504 | +.IR surrogate\ pair : | |
505 | +. | |
506 | +.RS 2n | |
507 | +.ll -2n | |
508 | +.IP \(bu 2n | |
509 | +Any translation unit, | |
510 | +in which | |
511 | +.BR \%mbrtowc () | |
512 | +is called, | |
513 | +should: | |
514 | +.RS 2n | |
515 | +.ll -2n | |
516 | +.IP a) 3n | |
517 | +explicitly define either the | |
518 | +.BR \%_ISOC99_SOURCE , | |
519 | +or the | |
520 | +.B \%_ISOC11_SOURCE | |
521 | +feature test macro, | |
522 | +(with any arbitrary value, | |
523 | +or even no value), | |
524 | +.B before | |
525 | +including | |
526 | +.I any | |
527 | +header file, | |
528 | +and | |
529 | +.IP b) 3n | |
530 | +include the | |
531 | +.B \%<winnls.h> | |
532 | +header file, | |
533 | +in addition to the required | |
534 | +.B \%<wchar.h> | |
535 | +header. | |
536 | +.ll +2n | |
537 | +.RE | |
538 | +. | |
539 | +.IP \(bu 2n | |
540 | +Following each call of | |
541 | +.BR \%mbrtowc (), | |
542 | +which returns a | |
543 | +.B \%wchar_t | |
544 | +value with a converted byte count greater than zero, | |
545 | +test the returned | |
546 | +.B \%wchar_t | |
547 | +value, | |
548 | +using the | |
549 | +.BR \%IS_HIGH_SURROGATE () | |
550 | +macro. | |
551 | +. | |
552 | +.IP \(bu 2 | |
553 | +When the | |
554 | +.BR \%IS_HIGH_SURROGATE () | |
555 | +macro call indicates that the returned | |
556 | +.B \%wchar_t | |
557 | +value does represent a | |
558 | +.IR high\ surrogate , | |
559 | +immediately call | |
560 | +.BR mbrtowc () | |
561 | +again, | |
562 | +passing the | |
563 | +.B \%*ps | |
564 | +state as returned by the original call, | |
565 | +together with the original multibyte sequence reference, | |
566 | +but with an explicit scan length limit, | |
567 | +.BR \%n , | |
568 | +of zero, | |
569 | +and an alternative | |
570 | +.B \%wchar_t | |
571 | +buffer reference pointer, | |
572 | +for storage of the | |
573 | +.IR low\ surrogate ; | |
574 | +on successful retrieval of this | |
575 | +.IR low\ surrogate , | |
576 | +the additional converted byte count will be returned as zero, | |
577 | +and the pending | |
578 | +.B \%*ps | |
579 | +conversion state will have been cleared, | |
580 | +(i.e.\& reset to the initial state). | |
581 | +.ll +2n | |
582 | +.RE | |
583 | +. | |
584 | +.PP | |
585 | +Thus, | |
586 | +considering the preceding example, | |
587 | +to support interpretation of | |
588 | +.I surrogate pairs | |
589 | +the example code should be modified by insertion of: | |
590 | +.PP | |
591 | +.RS 4n | |
592 | +.EX | |
593 | +#define _ISOC99_SOURCE | |
594 | +#include <winnls.h> | |
595 | +.EE | |
596 | +.RE | |
597 | +.PP | |
598 | +at the top of the source file, | |
599 | +and reimplementation of the | |
600 | +.BR print_conv () | |
601 | +function, | |
602 | +to incorporate the | |
603 | +.BR IS_HIGH_SURROGATE () | |
604 | +test, | |
605 | +and response: | |
606 | +.PP | |
607 | +.RS 4n | |
608 | +.EX | |
609 | +void print_conv( const char *mbs ) | |
610 | +{ | |
611 | + wchar_t wch; | |
612 | + size_t n = mbrtowc( &wch, mbs, MB_LEN_MAX, NULL ); | |
613 | + if( (int)(n) > 0 ) | |
614 | + { | |
615 | + if( IS_HIGH_SURROGATE( wch ) | |
616 | + { | |
617 | + wchar_t wcl; | |
618 | + mbrtowc( &wcl, mbs, 0, NULL ); | |
619 | + printf( "%u bytes \-> 0x%04X:0x%04X\en", n, wch, wcl ); | |
620 | + } | |
621 | + else printf( "%u bytes \-> 0x%04X\en", n, wch ); | |
622 | + } | |
623 | + else if( n == (size_t)(\-1) ) perror( "mbrtowc" ); | |
624 | +} | |
625 | +.EE | |
626 | +.RE | |
627 | +. | |
628 | +.PP | |
629 | +With these changes in place, | |
630 | +the output from the program becomes: | |
631 | +.PP | |
632 | +.RS 4n | |
633 | +.EX | |
634 | +4 bytes \-> 0xD834:0xDD0B | |
635 | +3 bytes \-> 0x6C34 | |
636 | +.EE | |
637 | +.RE | |
638 | +.PP | |
639 | +thus now correctly reporting the conversion of the | |
640 | +.IR surrogate\ pair , | |
641 | +and then correctly interpreting the following \%3-byte | |
642 | +.B \%UTF\(hy8 | |
643 | +sequence. | |
644 | +. | |
645 | +.PP | |
646 | +Please be aware that the underlying \%MS\(hyWindows API, | |
647 | +which is used to interpret the multibyte sequence, | |
648 | +offers no readily accessible mechanism to discriminate | |
649 | +between incomplete and invalid sequences; | |
650 | +thus, | |
651 | +if | |
652 | +.I n | |
653 | +is less than the effective | |
654 | +.B \%MB_CUR_MAX | |
655 | +for the active codeset, | |
656 | +this | |
657 | +.BR \%mbrtowc () | |
658 | +implementation may return | |
659 | +.IR \%(size_t)(\-2) , | |
660 | +indicating an incomplete sequence, | |
661 | +even in cases where there are no additional bytes | |
662 | +which could be appended, | |
663 | +to complete a valid encoding sequence. | |
664 | +. | |
665 | +. | |
666 | +.SH SEE ALSO | |
667 | +.BR mbsrtowcs (3) | |
668 | +. | |
669 | +. | |
670 | +.SH AUTHOR | |
671 | +This manpage was written by \%Keith\ Marshall, | |
672 | +\%<keith@users.osdn.me>, | |
673 | +to document the | |
674 | +.BR \%mbrtowc () | |
675 | +function as it has been implemented for the MinGW.org Project. | |
676 | +It may be copied, modified and redistributed, | |
677 | +without restriction of copyright, | |
678 | +provided this acknowledgement of contribution by | |
679 | +the original author remains in place. | |
680 | +. | |
681 | +.\" EOF |
@@ -0,0 +1,262 @@ | ||
1 | +.\" vim: ft=nroff | |
2 | +.TH %PAGEREF% MinGW "MinGW Programmer's Reference Manual" | |
3 | +. | |
4 | +.SH NAME | |
5 | +.B \%mbsinit | |
6 | +\- check state of multibyte to wide character conversion | |
7 | +. | |
8 | +. | |
9 | +.SH SYNOPSIS | |
10 | +.B #include | |
11 | +.RB < wchar.h > | |
12 | +.PP | |
13 | +.B int mbsinit( mbstate_t | |
14 | +.BI * ps | |
15 | +.B ); | |
16 | +. | |
17 | +. | |
18 | +.SH DESCRIPTION | |
19 | +If | |
20 | +.I ps | |
21 | +is not a NULL pointer, | |
22 | +the | |
23 | +.BR \%mbsinit () | |
24 | +function determines whether the | |
25 | +.B \%mbstate_t | |
26 | +object, | |
27 | +to which it points, | |
28 | +represents a multibyte to wide character conversion in the | |
29 | +.IR initial , | |
30 | +or in an | |
31 | +.I intermediate | |
32 | +state. | |
33 | +. | |
34 | +.PP | |
35 | +The | |
36 | +.I initial | |
37 | +conversion state is represented by a | |
38 | +.I zero\(hyvalued | |
39 | +.B \%mbstate_t | |
40 | +object. | |
41 | +(POSIX.1 stipulates that this representation must be supported, | |
42 | +although additional alternative representations are permitted; | |
43 | +MinGW uses only the zero\(hyvalued representation). | |
44 | +. | |
45 | +.PP | |
46 | +In MinGW, | |
47 | +an initial conversion state may be establised by initialization: | |
48 | +.PP | |
49 | +.RS 4n | |
50 | +.EX | |
51 | +mbstate_t st = (mbstate_t)(0), *ps = &st; | |
52 | +.EE | |
53 | +.RE | |
54 | +.PP | |
55 | +or by assignment: | |
56 | +.PP | |
57 | +.RS 4n | |
58 | +.EX | |
59 | +*ps = (mbstate_t)(0); | |
60 | +.EE | |
61 | +.RE | |
62 | +.PP | |
63 | +However, | |
64 | +for portability: | |
65 | +.PP | |
66 | +.RS 4n | |
67 | +.EX | |
68 | +memset( ps, 0, sizeof( mbstate_t )); | |
69 | +.EE | |
70 | +.RE | |
71 | +.PP | |
72 | +may be preferred. | |
73 | +. | |
74 | +.PP | |
75 | +Nominally, | |
76 | +.B \%mbstate_t | |
77 | +objects represent | |
78 | +.I shift states | |
79 | +of the active codeset. | |
80 | +However, | |
81 | +since \%MS\(hyWindows codesets do not use shift states, | |
82 | +as such, | |
83 | +MinGW uses | |
84 | +.B \%mbsinit_t | |
85 | +odjects to represent an alternative class of | |
86 | +.I intermediate conversion | |
87 | +.IR states , | |
88 | +viz.: | |
89 | +.RS 2n | |
90 | +.ll -2n | |
91 | +.IP \(bu 2n | |
92 | +Parsing of a multibyte sequence has been interrupted, | |
93 | +before interpretation of | |
94 | +.B \%MB_CUR_MAX | |
95 | +bytes, | |
96 | +without identification of a complete code point; | |
97 | +this conversion state may arise following a call of | |
98 | +.BR mbrlen (3), | |
99 | +or | |
100 | +.BR mbrtowc (3), | |
101 | +which has returned a parsed sequence length of | |
102 | +.IR \%(size_t)(\-2) . | |
103 | +. | |
104 | +.IP \(bu 2n | |
105 | +Processing of a wide character sequence has encountered a | |
106 | +.IR high\ surrogate , | |
107 | +but the complementary | |
108 | +.I low surrogate | |
109 | +has yet to be evaluated; | |
110 | +this state may arise after a call of | |
111 | +.BR mbrtowc (3), | |
112 | +has returned the | |
113 | +.IR high\ surrogate , | |
114 | +(with a returned sequence length between | |
115 | +.I one | |
116 | +and | |
117 | +.BR \%MB_CUR_MAX ), | |
118 | +and a further call is needed, | |
119 | +to retrieve the | |
120 | +.IR low\ surrogate ; | |
121 | +alternatively, | |
122 | +a complementary conversion state may arise when | |
123 | +.BR wcrtomb (3) | |
124 | +has been called to interpret a | |
125 | +.IR high\ surrogate , | |
126 | +and a further call, | |
127 | +to complete the conversion to a multibyte sequence, | |
128 | +by evaluation of the complementary | |
129 | +.IR low\ surrogate , | |
130 | +is still required. | |
131 | +.ll +2n | |
132 | +.RE | |
133 | +. | |
134 | +. | |
135 | +.SH RETURN VALUE | |
136 | +If | |
137 | +.I ps | |
138 | +is a NULL pointer, | |
139 | +or if the conversion state, | |
140 | +represented by the | |
141 | +.B \%mbstate_t | |
142 | +object to which it points, | |
143 | +is the | |
144 | +.I initial | |
145 | +state, | |
146 | +.BR \%mbsinit () | |
147 | +returns a | |
148 | +.I \%non\(hyzero | |
149 | +value; | |
150 | +otherwise, | |
151 | +.I \%zero | |
152 | +is returned, | |
153 | +indicating an | |
154 | +.I intermediate | |
155 | +conversion state. | |
156 | +. | |
157 | +. | |
158 | +.SH ERROR CONDITIONS | |
159 | +No error conditions are defined. | |
160 | +. | |
161 | +. | |
162 | +.SH STANDARDS CONFORMANCE | |
163 | +There is no Microsoft implementation of the | |
164 | +.BR mbsinit () | |
165 | +function, | |
166 | +which is readily accessible for use in MinGW applications; | |
167 | +the | |
168 | +.I \%libmingwex | |
169 | +implementation conforms generally to | |
170 | +.BR \%ISO\(hyC99 , | |
171 | +.BR \%POSIX.1\(hy2001 , | |
172 | +and | |
173 | +.BR \%POSIX.1\(hy2008 . | |
174 | +. | |
175 | +. | |
176 | +.\"SH EXAMPLE | |
177 | +. | |
178 | +. | |
179 | +.SH CAVEATS AND BUGS | |
180 | +Prior to \%mingwrt\(hy5.3, | |
181 | +the | |
182 | +.I \%libmingwex | |
183 | +implementation of | |
184 | +.BR mbsinit () | |
185 | +would always return | |
186 | +.IR \%non\(hyzero , | |
187 | +apparently indicating an | |
188 | +.I initial | |
189 | +conversion state, | |
190 | +regardless of the actual state indicated by any | |
191 | +.B \%mbstate_t | |
192 | +object referred to by | |
193 | +.IR *ps ; | |
194 | +this defect is corrected, | |
195 | +in \%mingwrt\(hy5.3. | |
196 | +. | |
197 | +.PP | |
198 | +Any | |
199 | +.I intermediate conversion | |
200 | +.IR state , | |
201 | +arising from a call to | |
202 | +.BR mbrlen (3), | |
203 | +.BR mbrtowc (3), | |
204 | +or | |
205 | +.BR wcrtomb (3), | |
206 | +is specific to the particular conversion which produces it. | |
207 | +Any intermediate state produced by | |
208 | +.BR mbrlen (3), | |
209 | +or by | |
210 | +.BR mbrtowc (3) | |
211 | +may be resolved by a further call to either of these two functions, | |
212 | +or to | |
213 | +.BR mbsrtowcs (3), | |
214 | +provided the initial part of the multibyte sequence, | |
215 | +passed in the subsequent call, | |
216 | +completes the sequence which led to the intermediate state; | |
217 | +if this intermediate state is used in any other context, | |
218 | +the consequent behaviour is undefined. | |
219 | +. | |
220 | +.PP | |
221 | +Similarly, | |
222 | +an intermediate state resulting from a call to | |
223 | +.BR wcrtomb (3) | |
224 | +may be resolved by a further call to | |
225 | +.BR wcrtomb (3), | |
226 | +or to | |
227 | +.BR wcsrtomb (3), | |
228 | +provided the first, | |
229 | +(or the only), | |
230 | +wide character to be interpreted, | |
231 | +in the subsequent call, | |
232 | +represents the | |
233 | +.I low surrogate | |
234 | +which completes the pending | |
235 | +.I surrogate pair | |
236 | +from which the intermediate state was created. | |
237 | +Once again, | |
238 | +if this intermediate state is used in any other context, | |
239 | +the consequent behaviour is undefined. | |
240 | +. | |
241 | +. | |
242 | +.SH SEE ALSO | |
243 | +.BR \%mbrlen (3), | |
244 | +.BR \%mbrtowc (3), | |
245 | +.BR \%mbsrtowcs (3), | |
246 | +.BR \%wcrtomb (3), | |
247 | +and | |
248 | +.BR \%wcrtomb (3). | |
249 | +. | |
250 | +. | |
251 | +.SH AUTHOR | |
252 | +This manpage was written by \%Keith\ Marshall, | |
253 | +\%<keith@users.osdn.me>, | |
254 | +to document the | |
255 | +.BR \%mbsinit () | |
256 | +function as it has been implemented for the MinGW.org Project. | |
257 | +It may be copied, modified and redistributed, | |
258 | +without restriction of copyright, | |
259 | +provided this acknowledgement of contribution by | |
260 | +the original author remains in place. | |
261 | +. | |
262 | +.\" EOF |
@@ -0,0 +1,521 @@ | ||
1 | +.\" vim: ft=nroff | |
2 | +.TH %PAGEREF% MinGW "MinGW Programmer's Reference Manual" | |
3 | +. | |
4 | +.SH NAME | |
5 | +.B mbsrtowcs | |
6 | +\- convert from multibyte to wide character string | |
7 | +. | |
8 | +. | |
9 | +.SH SYNOPSIS | |
10 | +.B #include | |
11 | +.RB < wchar.h > | |
12 | +.PP | |
13 | +.B size_t mbsrtowcs( wchar_t | |
14 | +.BI * dst , | |
15 | +.B const char | |
16 | +.BI ** src , | |
17 | +.B size_t | |
18 | +.IB len , | |
19 | +.B mbstate_t | |
20 | +.BI * ps | |
21 | +.B ); | |
22 | +. | |
23 | +.IP \& -4n | |
24 | +Feature Test Macro Requirements for libmingwex: | |
25 | +.PP | |
26 | +.BR \%__MSVCRT_VERSION__ : | |
27 | +since \%mingwrt\(hy5.3, | |
28 | +if this feature test macro is | |
29 | +.IR defined , | |
30 | +with a value of | |
31 | +.I at least | |
32 | +.IR 0x0800 , | |
33 | +(corresponding to the symbolic constant, | |
34 | +.BR \%__MSCVR80_DLL , | |
35 | +and thus declaring intent to link with \%MSVCR80.DLL, | |
36 | +or any later version of \%Microsoft\(aqs \%non\(hyfree runtime library, | |
37 | +instead of with \%MSVCRT.DLL), | |
38 | +calls to | |
39 | +.BR mbsrtowcs () | |
40 | +will be directed to the implementation thereof, | |
41 | +within \%Microsoft\(aqs runtime DLL. | |
42 | +. | |
43 | +.PP | |
44 | +.BR \%_ISOC99_SOURCE , | |
45 | +.BR \%_ISOC11_SOURCE : | |
46 | +since \%mingwrt\(hy5.3.1, | |
47 | +when linking with \%MSVCRT.DLL, | |
48 | +or when | |
49 | +.B \%__MSVCRT_VERSION__ | |
50 | +is either | |
51 | +.IR undefined , | |
52 | +or is | |
53 | +.I defined | |
54 | +with any value which is | |
55 | +.I less than | |
56 | +.IR 0x0800 , | |
57 | +(thus denying intent to link with \%MSVCR80.DLL, | |
58 | +or any later \%non\(hyfree version of Microsoft\(aqs runtime library), | |
59 | +.I explicitly | |
60 | +defining either of these feature test macros | |
61 | +will cause any call to | |
62 | +.BR \%mbsrtowcs () | |
63 | +to be directed to the | |
64 | +.I \%libmingwex | |
65 | +implementation; | |
66 | +if neither macro is defined, | |
67 | +calls to | |
68 | +.BR \%mbsrtowcs () | |
69 | +will be directed to Microsoft\(aqs runtime implementation, | |
70 | +if it is available, | |
71 | +otherwise falling back to the | |
72 | +.I \%libmingwex | |
73 | +implementation. | |
74 | +. | |
75 | +.PP | |
76 | +Prior to \%mingwrt\(hy5.3, | |
77 | +none of the above feature test macros have any effect on | |
78 | +.BR \%mbsrtowcs (); | |
79 | +all calls will be directed to the | |
80 | +.I \%libmingwex | |
81 | +implementation. | |
82 | +. | |
83 | +. | |
84 | +.SH DESCRIPTION | |
85 | +.PP | |
86 | +Commencing from the conversion state specified in | |
87 | +.IR *ps , | |
88 | +the | |
89 | +.BR \%mbsrtowcs () | |
90 | +function converts the multibyte character sequence, | |
91 | +starting at | |
92 | +.IR *src , | |
93 | +to a sequence of wide characters; | |
94 | +each conversion is performed as if by calling the | |
95 | +.BR mbrtowc (3) | |
96 | +function. | |
97 | +. | |
98 | +.PP | |
99 | +If | |
100 | +.I dst | |
101 | +is not a NULL pointer, | |
102 | +the resulting sequence of wide characters, | |
103 | +up to a maximum of | |
104 | +.I len | |
105 | +in number, | |
106 | +will be stored as a wide character string, | |
107 | +starting at | |
108 | +.IR dst ; | |
109 | +conversion may be curtailed, | |
110 | +before | |
111 | +.I len | |
112 | +wide characters have been stored, | |
113 | +under any of the following conditions: | |
114 | +.RS 2n | |
115 | +.ll -2n | |
116 | +.IP \(bu 2n | |
117 | +The result of any one conversion represents the NUL wide character, | |
118 | +(in which case the NUL wide character is stored, | |
119 | +but is not included in the count of characters converted). | |
120 | +. | |
121 | +.IP \(bu 2n | |
122 | +The result of any single multibyte character conversion is a | |
123 | +.IR surrogate\ pair , | |
124 | +but the available space, | |
125 | +remaining in the conversion buffer, | |
126 | +is insufficient to accommodate more than one | |
127 | +.B \%wchar_t | |
128 | +value. | |
129 | +. | |
130 | +.IP \(bu 2n | |
131 | +An invalid multibyte character sequence is encountered, | |
132 | +(in which case the conversion state becomes undefined). | |
133 | +.ll +2n | |
134 | +.RE | |
135 | +. | |
136 | +.PP | |
137 | +Conversely, | |
138 | +if | |
139 | +.I dst | |
140 | +is a NULL pointer, | |
141 | +the | |
142 | +.I len | |
143 | +argument is ignored, | |
144 | +and conversions are performed until either | |
145 | +the multibyte equivalent of the NUL character, | |
146 | +or an invalid multibyte sequence is encountered, | |
147 | +but no wide characters are stored. | |
148 | +. | |
149 | +.PP | |
150 | +The sequence of bytes, | |
151 | +pointed to by | |
152 | +.IR *src , | |
153 | +is interpreted as a multibyte character sequence | |
154 | +in the codeset which is associated with the | |
155 | +.B \%LC_CTYPE | |
156 | +category of the active process locale. | |
157 | +. | |
158 | +.PP | |
159 | +If | |
160 | +.I ps | |
161 | +is specified as a NULL pointer, | |
162 | +.BR \%mbsrtowcs () | |
163 | +will track conversion state using an internal | |
164 | +.B \%mbstate_t | |
165 | +object reference, | |
166 | +which is private within the | |
167 | +.BR \%mbsrtowcs () | |
168 | +process address space; | |
169 | +at process \%start\(hyup, | |
170 | +this internal | |
171 | +.B \%mbstate_t | |
172 | +object is initialized to represent | |
173 | +the initial conversion state. | |
174 | +. | |
175 | +. | |
176 | +.SH RETURN VALUE | |
177 | +On successful conversion of the multibyte character | |
178 | +sequence indirectly pointed to by | |
179 | +.IR *src , | |
180 | +up to the wide character string length limit specified by | |
181 | +.IR len , | |
182 | +.BR \%mbsrtowcs () | |
183 | +updates | |
184 | +.IR *src , | |
185 | +by either: | |
186 | +.RS 2n | |
187 | +.ll -2n | |
188 | +.IP \(bu 2n | |
189 | +Replacing it with a NULL pointer, | |
190 | +if conversion is terminated by a NUL character, | |
191 | +before | |
192 | +.I len | |
193 | +wide characters have been evaluated. | |
194 | +. | |
195 | +.IP \(bu 2n | |
196 | +Incrementing it, | |
197 | +such that it points to the first multibyte character in the | |
198 | +.I *src | |
199 | +sequence, | |
200 | +which, | |
201 | +when converted, | |
202 | +would produce wide characters beyond the string length | |
203 | +limit specified by | |
204 | +.IR len . | |
205 | +.ll +2n | |
206 | +.RE | |
207 | +.PP | |
208 | +In either case, | |
209 | +.BR mbsrtowcs () | |
210 | +returns the actual number of | |
211 | +.B \%wchar_t | |
212 | +values which have been stored at | |
213 | +.IR dst , | |
214 | +(if | |
215 | +.I dst | |
216 | +is not a NULL pointer, | |
217 | +or which would have been stored, | |
218 | +otherwise). | |
219 | +. | |
220 | +. | |
221 | +.SH ERROR CONDITIONS | |
222 | +If, | |
223 | +at any stage of conversion of the multibyte sequence at | |
224 | +.IR \%*src , | |
225 | +and, | |
226 | +if | |
227 | +.I dst | |
228 | +is not a NULL pointer, | |
229 | +before | |
230 | +.I len | |
231 | +.B \%wchar_t | |
232 | +values have been evaluated, | |
233 | +any sequence within | |
234 | +.IR \%*src , | |
235 | +which does not represent a valid multibyte character, | |
236 | +is encountered, | |
237 | +then | |
238 | +.I \%errno | |
239 | +is set to | |
240 | +.BR \%EILSEQ , | |
241 | +and | |
242 | +.BR \%mbsrtowcs () | |
243 | +returns | |
244 | +.IR \%(size_t)(\-1) ; | |
245 | +the conversion state, | |
246 | +including the state of any | |
247 | +.B \%wchar_t | |
248 | +values already stored at | |
249 | +.IR \%*dst , | |
250 | +is undefined. | |
251 | +. | |
252 | +. | |
253 | +.SH STANDARDS CONFORMANCE | |
254 | +Except in respect of its provisions for handling of | |
255 | +.IR surrogate\ pairs , | |
256 | +and to the extent that it may be affected by limitations | |
257 | +of the underlying \%MS\(hyWindows API, | |
258 | +the | |
259 | +.I \%libmingwex | |
260 | +implementation of | |
261 | +.BR mbsrtowcs () | |
262 | +conforms generally to | |
263 | +.BR \%ISO\(hyC99 , | |
264 | +.BR \%POSIX.1\(hy2001 , | |
265 | +and | |
266 | +.BR \%POSIX.1\(hy2008 ; | |
267 | +(prior to \%mingwrt\-5.3, | |
268 | +and in those cases where calls may be delegated | |
269 | +to a Microsoft runtime DLL implementation, | |
270 | +this level of conformity may not be achieved). | |
271 | +. | |
272 | +. | |
273 | +.\"SH EXAMPLE | |
274 | +. | |
275 | +. | |
276 | +.SH CAVEATS AND BUGS | |
277 | +Due to a documented limitation of Microsoft\(aqs | |
278 | +.BR \%setlocale () | |
279 | +function implementation, | |
280 | +it is not possible to directly select an active locale, | |
281 | +in which the codeset is represented by any multibyte | |
282 | +character sequence with an effective | |
283 | +.B \%MB_CUR_MAX | |
284 | +of more than two bytes. | |
285 | +Prior to | |
286 | +.IR \%mingwrt\(hy5.3 , | |
287 | +this limitation precludes the use of | |
288 | +.BR \%mbsrtowcs () | |
289 | +to interpret any codeset with | |
290 | +.B \%MB_CUR_MAX | |
291 | +greater than two bytes, | |
292 | +(such as | |
293 | +.BR \%UTF\(hy8 ). | |
294 | +From | |
295 | +.I \%mingwrt\(hy5.3 | |
296 | +onward, | |
297 | +the MinGW.org implementation of | |
298 | +.BR \%mbsrtowcs () | |
299 | +mitigates this limitation by assignment of the codeset | |
300 | +from the | |
301 | +.B \%LC_CTYPE | |
302 | +environment variable, | |
303 | +provided the system default has been previously activated | |
304 | +for the | |
305 | +.B \%LC_CTYPE | |
306 | +locale category; | |
307 | +e.g.\ execution of: | |
308 | +.PP | |
309 | +.RS 4n | |
310 | +.EX | |
311 | +#include <stdio.h> | |
312 | +#include <stdlib.h> | |
313 | +#include <locale.h> | |
314 | +#include <wchar.h> | |
315 | + | |
316 | +void print_conv( const char * ); | |
317 | + | |
318 | +int main() | |
319 | +{ | |
320 | + setlocale( LC_CTYPE, "" ); | |
321 | + putenv( "LC_CTYPE=en_GB.65001" ); | |
322 | + print_conv( "\exe6\exb0\exb4\exf0\ex9d\ex84\ex8b" ); | |
323 | + return 0; | |
324 | +} | |
325 | + | |
326 | +void print_conv( const char *mbs ) | |
327 | +{ | |
328 | + size_t len; | |
329 | + if( (len = 1 + mbsrtowcs( NULL, &mbs, 0, NULL )) > 0 ) | |
330 | + { | |
331 | + wchar_t wcs[len]; | |
332 | + len = mbsrtowcs( wch, &mbs, len, NULL ); | |
333 | + printf( "%d wide char%s: ", len, (len == 1) ? "" : "s" ); | |
334 | + while( len > 0 ) | |
335 | + { printf( "0x%04X%c", *wcs++, (--len > 0) : ':' : '\n' ); | |
336 | + } | |
337 | + } | |
338 | + else perror( "mbsrtowcs" ); | |
339 | +} | |
340 | +.EE | |
341 | +.RE | |
342 | +.PP | |
343 | +will convert the | |
344 | +.B \%UTF\(hy8 | |
345 | +encoded multibyte sequence, | |
346 | +\fC\%"\exe6\exb0\exb4\exf0\ex9d\ex84\ex8b"\fP, | |
347 | +(which represents the two Unicode code points, | |
348 | +\fC\%"\eu6c34"\fP and \fC\%\eU0001d10b")\fP, | |
349 | +to its equivalent | |
350 | +.B \%wchar_t | |
351 | +sequence, | |
352 | +resulting in the three\(hyvalue output sequence: | |
353 | +.PP | |
354 | +.RS 4n | |
355 | +.EX | |
356 | +3 wide chars: 0x6C34:0xD834:0xDD0B | |
357 | +.EE | |
358 | +.RE | |
359 | +. | |
360 | +.PP | |
361 | +Note that, | |
362 | +in the preceding example, | |
363 | +although the input | |
364 | +.B \%UTF\(hy8 | |
365 | +sequence represents only | |
366 | +.I two | |
367 | +Unicode code points, | |
368 | +the output shows | |
369 | +.I \%three | |
370 | +distinct | |
371 | +.B \%wchar_t | |
372 | +values, | |
373 | +with the second code point being represented by the | |
374 | +.IR surrogate\ pair , | |
375 | +\fC\%"0xD834:0xDD0B"\fP. | |
376 | +This raises a potential issue, | |
377 | +which is consequent on Microsoft\(aqs choice of | |
378 | +.B \%UTF-16LE | |
379 | +as the underlying representation of the | |
380 | +.B \%wchar_t | |
381 | +data type: | |
382 | +normally, | |
383 | +when | |
384 | +.I dst | |
385 | +is not a NULL pointer, | |
386 | +the MinGW | |
387 | +.BR mbsrtowcs () | |
388 | +function will simply store a | |
389 | +.I surrogate\ pair | |
390 | +when necessary, | |
391 | +but in the particular case where doing so would cause the | |
392 | +.I low\ surrogate | |
393 | +to overrun the buffer length specified by the | |
394 | +.I len | |
395 | +argument, | |
396 | +then no part of the | |
397 | +.I surrogate\ pair | |
398 | +will be stored, | |
399 | +and | |
400 | +.BR mbsrtowcs () | |
401 | +will stop as if the buffer length limit has been reached, | |
402 | +at a count of one less than | |
403 | +.IR len . | |
404 | +This case may be distinguished from a short count due to | |
405 | +conversion of a NUL character, | |
406 | +(in which case | |
407 | +.I *src | |
408 | +will have been respecified as a NULL pointer), | |
409 | +by inspection of | |
410 | +.IR *src , | |
411 | +which will have been updated to point, | |
412 | +in this case, | |
413 | +to the start of that part of the multibyte sequence | |
414 | +which represents the | |
415 | +.IR surrogate\ pair . | |
416 | +. | |
417 | +.PP | |
418 | +A further issue, | |
419 | +also related to | |
420 | +.IR surrogate\ pairs , | |
421 | +may arise if the | |
422 | +.B \%mbstate_t | |
423 | +object passed via the | |
424 | +.I *ps | |
425 | +argument originates from a preceding | |
426 | +.BR mbrtowc (3) | |
427 | +call which has returned a | |
428 | +.IR high\ surrogate , | |
429 | +but the | |
430 | +.I low\ surrogate | |
431 | +has not been retrieved. | |
432 | +In this case, | |
433 | +the | |
434 | +.I low\ surrogate | |
435 | +is returned, | |
436 | +(and potentially orphaned), | |
437 | +as the first | |
438 | +.B \%wchar_t | |
439 | +value to be considered for storage at | |
440 | +.IR dst . | |
441 | +This may not be what you want, | |
442 | +but it is supported as an alternative to the method, | |
443 | +formally documented using | |
444 | +.BR mbrtowc (3), | |
445 | +for completion of a | |
446 | +.IR surrogate\ pair ; | |
447 | +for example: | |
448 | +.PP | |
449 | +.RS 4n | |
450 | +.EX | |
451 | +#define _ISOC99_SOURCE | |
452 | + | |
453 | +#include <stdio.h> | |
454 | +#include <stdlib.h> | |
455 | +#include <locale.h> | |
456 | +#include <limits.h> | |
457 | +#include <winnls.h> | |
458 | +#include <wchar.h> | |
459 | + | |
460 | +void print_conv( const char * ); | |
461 | + | |
462 | +int main() | |
463 | +{ | |
464 | + setlocale( LC_CTYPE, "" ); | |
465 | + putenv( "LC_CTYPE=en_GB.65001" ); | |
466 | + print_conv( "\eU0001d10b" ); | |
467 | + print_conv( "\eu6c34" ); | |
468 | + return 0; | |
469 | +} | |
470 | + | |
471 | +void print_conv( const char *mbs ) | |
472 | +{ | |
473 | + wchar_t wch; | |
474 | + mbstate_t ps = (mbstate_t)(0); | |
475 | + size_t n = mbrtowc( &wch, mbs, MB_LEN_MAX, &ps ); | |
476 | + if( (int)(n) > 0 ) | |
477 | + { | |
478 | + if( IS_HIGH_SURROGATE( wch ) ) | |
479 | + { | |
480 | + wchar_t wcl; | |
481 | + mbsrtowcs( &wcl, &mbs, 1, &ps ); | |
482 | + printf( "%u bytes -> 0x%04X:0x%04X\en", n, wch, wcl ); | |
483 | + } | |
484 | + else printf( "%u bytes -> 0x%04X\en", n, wch ); | |
485 | + } | |
486 | + else if( n == (size_t)(-1) ) perror( "mbrtowc" ); | |
487 | +} | |
488 | +.EE | |
489 | +.RE | |
490 | +.PP | |
491 | +is equivalent to the example given for | |
492 | +.I surrogate\ pair | |
493 | +completion using | |
494 | +.BR mbrtowc (3). | |
495 | +Regardless of the method used to complete | |
496 | +.IR surrogate\ pairs , | |
497 | +it is the caller\(aqs responsibility to ensure that the | |
498 | +.I high\ surrogate | |
499 | +and its complementary | |
500 | +.I low\ surrogate | |
501 | +remain correctly associated. | |
502 | +. | |
503 | +. | |
504 | +.SH SEE ALSO | |
505 | +.BR mbsinit (3), | |
506 | +and | |
507 | +.BR mbrtowc (3) | |
508 | +. | |
509 | +. | |
510 | +.SH AUTHOR | |
511 | +This manpage was written by \%Keith\ Marshall, | |
512 | +\%<keith@users.osdn.me>, | |
513 | +to document the | |
514 | +.BR \%mbsrtowcs () | |
515 | +function as it has been implemented for the MinGW.org Project. | |
516 | +It may be copied, modified and redistributed, | |
517 | +without restriction of copyright, | |
518 | +provided this acknowledgement of contribution by | |
519 | +the original author remains in place. | |
520 | +. | |
521 | +.\" EOF |
@@ -0,0 +1,493 @@ | ||
1 | +.\" vim: ft=nroff | |
2 | +.TH %PAGEREF% MinGW "MinGW Programmer's Reference Manual" | |
3 | +. | |
4 | +.SH NAME | |
5 | +.B \%wcrtomb | |
6 | +\- convert a wide character to a multibyte sequence | |
7 | +. | |
8 | +. | |
9 | +.SH SYNOPSIS | |
10 | +.B #include | |
11 | +.RB < wchar.h > | |
12 | +.PP | |
13 | +.B size_t wcrtomb( char | |
14 | +.BI * s , | |
15 | +.B wchar_t | |
16 | +.IB wc , | |
17 | +.B mbstate_t | |
18 | +.BI * ps | |
19 | +.B ); | |
20 | +. | |
21 | +.IP \& -4n | |
22 | +Feature Test Macro Requirements for libmingwex: | |
23 | +.PP | |
24 | +.BR \%__MSVCRT_VERSION__ : | |
25 | +since \%mingwrt\(hy5.3, | |
26 | +if this feature test macro is | |
27 | +.IR defined , | |
28 | +with a value of | |
29 | +.I at least | |
30 | +.IR \%0x0800 , | |
31 | +(corresponding to the symbolic constant, | |
32 | +.BR \%__MSCVR80_DLL , | |
33 | +and thus declaring intent to link with \%MSVCR80.DLL, | |
34 | +or any later version of \%Microsoft\(aqs \%non\(hyfree runtime library, | |
35 | +instead of with \%MSVCRT.DLL), | |
36 | +calls to | |
37 | +.BR \%wcrtomb () | |
38 | +will be directed to the implementation thereof, | |
39 | +within \%Microsoft\(aqs runtime DLL. | |
40 | +. | |
41 | +.PP | |
42 | +.BR \%_ISOC99_SOURCE , | |
43 | +.BR \%_ISOC11_SOURCE : | |
44 | +since \%mingwrt\(hy5.3.1, | |
45 | +when linking with \%MSVCRT.DLL, | |
46 | +or when | |
47 | +.B \%__MSVCRT_VERSION__ | |
48 | +is either | |
49 | +.IR undefined , | |
50 | +or is | |
51 | +.I defined | |
52 | +with any value which is | |
53 | +.I less than | |
54 | +.IR \%0x0800 , | |
55 | +(thus denying intent to link with \%MSVCR80.DLL, | |
56 | +or any later \%non\(hyfree version of Microsoft\(aqs runtime library), | |
57 | +.I explicitly | |
58 | +defining either of these feature test macros | |
59 | +will cause any call to | |
60 | +.BR \%wcrtomb () | |
61 | +to be directed to the | |
62 | +.I \%libmingwex | |
63 | +implementation; | |
64 | +if neither macro is defined, | |
65 | +calls to | |
66 | +.BR \%wcrtomb () | |
67 | +will be directed to Microsoft\(aqs runtime implementation, | |
68 | +if it is available, | |
69 | +otherwise falling back to the | |
70 | +.I \%libmingwex | |
71 | +implementation. | |
72 | +. | |
73 | +.PP | |
74 | +Prior to \%mingwrt\(hy5.3, | |
75 | +none of the above feature test macros have any effect on | |
76 | +.BR \%wcrtomb (); | |
77 | +all calls will be directed to the | |
78 | +.I \%libmingwex | |
79 | +implementation. | |
80 | +. | |
81 | +. | |
82 | +.SH DESCRIPTION | |
83 | +The | |
84 | +.BR \%wcrtomb () | |
85 | +function determines the number of bytes which are required, | |
86 | +starting from the conversion state represented by the | |
87 | +.B \%mbstate_t | |
88 | +object at | |
89 | +.IR *ps , | |
90 | +to accommodate the multibyte character sequence, | |
91 | +in the codeset associated with the | |
92 | +.B \%LC_CTYPE | |
93 | +category of the active process locale, | |
94 | +which represents the completed conversion of | |
95 | +the wide character specified by | |
96 | +.IR wc . | |
97 | +. | |
98 | +.PP | |
99 | +In the special case, | |
100 | +when | |
101 | +.I s | |
102 | +is a NULL pointer, | |
103 | +the | |
104 | +.I wc | |
105 | +argument is ignored, | |
106 | +and the call is evaluated as if it had been invoked as | |
107 | +.PP | |
108 | +.RS 4n | |
109 | +.EX | |
110 | +wcrtomb( buf, L'\e0', ps ) | |
111 | +.EE | |
112 | +.RE | |
113 | +.PP | |
114 | +returning the effect of conversion of the NUL wide character, | |
115 | +as a completion of any intermediate conversion state specified in | |
116 | +.IR *ps , | |
117 | +but without storing the converted multibyte sequence; | |
118 | +(in this special case, | |
119 | +the | |
120 | +.B \%ISO\(hyC99 | |
121 | +standard specifies that | |
122 | +.I buf | |
123 | +should be an internal buffer, | |
124 | +but since such a buffer becomes effectively inaccessible, | |
125 | +storage of any converted multibyte sequence is unnecessary). | |
126 | +. | |
127 | +.PP | |
128 | +Conversely, | |
129 | +in the normal case, | |
130 | +when | |
131 | +.I s | |
132 | +is not a NULL pointer, | |
133 | +the | |
134 | +.BR \%wcrtomb () | |
135 | +function converts the wide character, | |
136 | +represented by | |
137 | +.IR wc , | |
138 | +to the corresponding multibyte character sequence, | |
139 | +which is stored in the byte array starting at | |
140 | +.IR *s , | |
141 | +and the function return value is set to | |
142 | +the number of bytes stored. | |
143 | +. | |
144 | +. | |
145 | +.SH RETURN VALUE | |
146 | +When conversion is successful, | |
147 | +regardless of whether the resultant multibyte sequence is stored, | |
148 | +or not, | |
149 | +the | |
150 | +.BR wcrtomb () | |
151 | +function returns the number of bytes which are, | |
152 | +or which would be, | |
153 | +stored at | |
154 | +.IR *s . | |
155 | +. | |
156 | +.PP | |
157 | +If the result of conversion represents a completed multibyte sequence, | |
158 | +the conversion state, | |
159 | +represented by | |
160 | +.IR *ps , | |
161 | +is updated to represent the | |
162 | +.I initial | |
163 | +.IR state . | |
164 | +Conversely, | |
165 | +if the result of conversion is equivalent to the conversion of a | |
166 | +.I high | |
167 | +.IR surrogate , | |
168 | +nothing is stored, | |
169 | +the return value is set to | |
170 | +.IR zero , | |
171 | +and the conversion state is updated to represent a pending | |
172 | +.I surrogate pair | |
173 | +completion. | |
174 | +. | |
175 | +. | |
176 | +.SH ERROR CONDITIONS | |
177 | +If the wide character, | |
178 | +passed as | |
179 | +.IR wc , | |
180 | +either cannot be converted to a valid multibyte sequence, | |
181 | +or does not complete a pending | |
182 | +.I surrogate pair | |
183 | +which can be represented as a valid multibyte sequence, | |
184 | +in the codeset of the active | |
185 | +.B \%LC_CTYPE | |
186 | +locale category, | |
187 | +.I \%errno | |
188 | +is set to | |
189 | +.BR \%EILSEQ , | |
190 | +the | |
191 | +.BR wcrtomb () | |
192 | +function returns | |
193 | +.IR (size_t)(\-1) , | |
194 | +and the conversion state is unspecified. | |
195 | +. | |
196 | +. | |
197 | +.SH STANDARDS CONFORMANCE | |
198 | +Except in respect of its extended provision for handling of | |
199 | +.IR surrogate\ pairs , | |
200 | +and to the extent that it may be affected by limitations | |
201 | +of the underlying \%MS\(hyWindows API, | |
202 | +the | |
203 | +.I \%libmingwex | |
204 | +implementation of | |
205 | +.BR \%wcrtomb () | |
206 | +conforms generally to | |
207 | +.BR \%ISO\(hyC99 , | |
208 | +.BR \%POSIX.1\(hy2001 , | |
209 | +and | |
210 | +.BR \%POSIX.1\(hy2008 ; | |
211 | +(prior to \%mingwrt\-5.3, | |
212 | +and in those cases where calls may be delegated | |
213 | +to a Microsoft runtime DLL implementation, | |
214 | +this level of conformity may not be achieved). | |
215 | +. | |
216 | +. | |
217 | +.\"SH EXAMPLE | |
218 | +. | |
219 | +. | |
220 | +.SH CAVEATS AND BUGS | |
221 | +Due to a documented limitation of Microsoft\(aqs | |
222 | +.BR \%setlocale () | |
223 | +function implementation, | |
224 | +it is not possible to directly select an active locale, | |
225 | +in which the codeset is represented by any multibyte | |
226 | +character sequence with an effective | |
227 | +.B \%MB_CUR_MAX | |
228 | +of more than two bytes. | |
229 | +Prior to \%mingwrt\(hy5.3, | |
230 | +this limitation precludes the use of | |
231 | +.BR \%wcrtomb () | |
232 | +to interpret any codeset with | |
233 | +.B \%MB_CUR_MAX | |
234 | +greater than two bytes, | |
235 | +(such as | |
236 | +.BR \%UTF\(hy8 ). | |
237 | +From \%mingwrt\(hy5.3 onward, | |
238 | +the MinGW.org implementation of | |
239 | +.BR \%wcrtomb () | |
240 | +mitigates this limitation by assignment of the codeset | |
241 | +from the | |
242 | +.B \%LC_CTYPE | |
243 | +environment variable, | |
244 | +provided the system default has been previously activated | |
245 | +for the | |
246 | +.B \%LC_CTYPE | |
247 | +locale category; | |
248 | +e.g.\ execution of: | |
249 | +.PP | |
250 | +.RS 4n | |
251 | +.EX | |
252 | +#define _ISOC99_SOURCE | |
253 | + | |
254 | +#include <stdio.h> | |
255 | +#include <stdlib.h> | |
256 | +#include <locale.h> | |
257 | +#include <limits.h> | |
258 | +#include <wchar.h> | |
259 | + | |
260 | +void print_conv( const wchar_t * ); | |
261 | + | |
262 | +int main() | |
263 | +{ | |
264 | + setlocale( LC_CTYPE, "" ); | |
265 | + putenv( "LC_CTYPE=en_GB.65001" ); | |
266 | + print_conv( L"\eu6c34\eU0001d10b" ); | |
267 | + return 0; | |
268 | +} | |
269 | + | |
270 | +void print_conv( const wchar_t *wcs ) | |
271 | +{ | |
272 | + wchar_t wch; | |
273 | + while( (wch = *wcs++) != L'\e0' ) | |
274 | + { | |
275 | + char mbs[MB_LEN_MAX]; | |
276 | + mbstate_t ps = (mbstate_t)(0); | |
277 | + size_t n = wcrtomb( mbs, wch, &ps ); | |
278 | + | |
279 | + if( (int)(n) > 0 ) | |
280 | + { | |
281 | + unsigned char *p = (unsigned char *)(mbs); | |
282 | + printf( "Single wide character: 0x%04X \-\-> %u byte%s", | |
283 | + wch, n, (n == 1) ? ": " : "s: " | |
284 | + ); | |
285 | + while( n > 0 ) | |
286 | + printf( "0x%02X%c", *p++, (\-\-n == 0) ? '\en' : ':' ); | |
287 | + } | |
288 | + else if( n == (size_t)(\-1) ) perror( "wcrtomb" ); | |
289 | + } | |
290 | +} | |
291 | +.EE | |
292 | +.RE | |
293 | +.PP | |
294 | +will successfully convert the \fCL"\eu6c34"\fP wide character to its | |
295 | +.B \%UTF\(hy8 | |
296 | +equivalent, | |
297 | +resulting in the output: | |
298 | +.PP | |
299 | +.RS 4n | |
300 | +.EX | |
301 | +Single wide character: 0x6C34 \-\-> 3 bytes: 0xE6:0xB0:0xB4 | |
302 | +.EE | |
303 | +.RE | |
304 | +.PP | |
305 | +However, | |
306 | +when it then progresses to the \fCL"\eU0001d10b"\fP wide character, | |
307 | +(which | |
308 | +.I should | |
309 | +be represented by a valid | |
310 | +.B \%UTF\(hy16LE | |
311 | +.I surrogate | |
312 | +.IR pair ), | |
313 | +it fails with the diagnostic: | |
314 | +.PP | |
315 | +.RS 4n | |
316 | +.EX | |
317 | +wcrtomb: Invalid or incomplete multibyte or wide character | |
318 | +.EE | |
319 | +.RE | |
320 | +. | |
321 | +.PP | |
322 | +This (possibly unexpected) failure is an unfortunate consequence | |
323 | +of Microsoft\(aqs choice of | |
324 | +.B \%UTF\(hy16LE | |
325 | +as the underlying representation of the | |
326 | +.B \%wchar_t | |
327 | +data type; | |
328 | +this choice makes it impossible for | |
329 | +.I any | |
330 | +\%MS\(hyWindows implementation of | |
331 | +.BR \%wcrtomb () | |
332 | +to be fully | |
333 | +.B \%ISO\(hyC99 | |
334 | +compliant. | |
335 | +To mitigate this non\(hycompliance, | |
336 | +the MinGW implementation of | |
337 | +.BR \%wcrtomb () | |
338 | +incorporates the following non\(hystandard capabilities: | |
339 | +.RS 2n | |
340 | +.ll -2n | |
341 | +.IP \(bu 2n | |
342 | +When the | |
343 | +.B \%mbstate_t | |
344 | +argument refers to the | |
345 | +.I initial conversion | |
346 | +.IR state , | |
347 | +and the | |
348 | +.B \%wchar_t | |
349 | +argument represents a | |
350 | +.I high | |
351 | +.IR surrogate , | |
352 | +then nothing is stored in the conversion buffer, | |
353 | +the | |
354 | +.B \%mbstate_t | |
355 | +reference is updated to indicate pending completion of the | |
356 | +.IR surrogate , | |
357 | +and the function returns an effective conversion count of | |
358 | +.I zero | |
359 | +bytes. | |
360 | +. | |
361 | +.IP \(bu 2n | |
362 | +When the | |
363 | +.B \%mbstate_t | |
364 | +argument refers to a pending completion of a | |
365 | +.I surrogate | |
366 | +.IR pair , | |
367 | +and the | |
368 | +.B \%wchar_t | |
369 | +argument represents a | |
370 | +.I low | |
371 | +.IR surrogate , | |
372 | +then the deferred | |
373 | +.I high surrogate | |
374 | +is combined with the | |
375 | +.I low surrogate | |
376 | +argument, | |
377 | +and the two are converted as a pair; | |
378 | +the resultant conversion is stored in the conversion buffer, | |
379 | +the | |
380 | +.B \%mbstate_t | |
381 | +reference is reset to the | |
382 | +.I initial conversion | |
383 | +.IR state , | |
384 | +and the function returns the number of bytes | |
385 | +which were stored in the conversion buffer. | |
386 | +.ll +2n | |
387 | +.RE | |
388 | +. | |
389 | +.PP | |
390 | +These capabilities of MinGW\(aqs | |
391 | +.BR \%wcrtomb () | |
392 | +are certainly non\(hystandard; | |
393 | +nonetheless, | |
394 | +they are required to circumvent non\(hyconformity, | |
395 | +which is imposed by an unfortunate Microsoft design choice, | |
396 | +and it is incumbent upon the caller of | |
397 | +.BR \%wcrtomb (), | |
398 | +on the \%MS\(hyWindows platform, | |
399 | +to make use of them. | |
400 | +The preceding example clearly illustrates how strictly | |
401 | +.B \%ISO\(hyC99 | |
402 | +conforming usage will yield incorrect behaviour; | |
403 | +the following illustrates how that example may be adapted, | |
404 | +by incorporation of the above non\(hystandard features, | |
405 | +to achieve correct behaviour: | |
406 | +.PP | |
407 | +.RS 4n | |
408 | +.EX | |
409 | +#define _ISOC99_SOURCE | |
410 | + | |
411 | +#include <stdio.h> | |
412 | +#include <stdlib.h> | |
413 | +#include <locale.h> | |
414 | +#include <limits.h> | |
415 | +#include <winnls.h> | |
416 | +#include <wchar.h> | |
417 | + | |
418 | +void print_conv( const wchar_t * ); | |
419 | + | |
420 | +int main() | |
421 | +{ | |
422 | + setlocale( LC_CTYPE, "" ); | |
423 | + putenv( "LC_CTYPE=en_GB.65001" ); | |
424 | + print_conv( L"\eu6c34\eU0001d10b" ); | |
425 | + return 0; | |
426 | +} | |
427 | + | |
428 | +#define DESC(FMT) FMT "0x%1$04X --> %2$u byte%3$s" | |
429 | + | |
430 | +void print_conv( const wchar_t *wcs ) | |
431 | +{ | |
432 | + while( *wcs != L'\e0' ) | |
433 | + { | |
434 | + wchar_t wch = *wcs; | |
435 | + char mbs[MB_LEN_MAX]; | |
436 | + mbstate_t ps = (mbstate_t)(0); | |
437 | + const char *fmt = DESC( "Single wide character: " ); | |
438 | + size_t n = wcrtomb( mbs, wch, &ps ); | |
439 | + | |
440 | + if( (n == (size_t)(0)) && IS_HIGH_SURROGATE( wch ) ) | |
441 | + { | |
442 | + if( (int)(n = wcrtomb( mbs, wcs[1], &ps )) > 0 ) | |
443 | + { | |
444 | + fmt = DESC( "Surrogate pair: 0x%1$04X:" ); | |
445 | + wcs++; | |
446 | + } | |
447 | + } | |
448 | + if( (int)(n) > 0 ) | |
449 | + { | |
450 | + unsigned char *p = (unsigned char *)(mbs); | |
451 | + printf( fmt, wch, n, (n == 1) ? ": " : "s: ", *wcs ); | |
452 | + while( n > 0 ) | |
453 | + printf( "0x%02X%c", *p++, (\-\-n == 0) ? '\en' : ':' ); | |
454 | + } | |
455 | + else if( n == (size_t)(\-1) ) perror( "wcrtomb" ); | |
456 | + if( *wcs != L'\e0' ) ++wcs; | |
457 | + } | |
458 | +} | |
459 | +.EE | |
460 | +.RE | |
461 | +.PP | |
462 | +It may be observed that, | |
463 | +on execution of this modified version of the example, | |
464 | +both the \fCL"\eu6c34"\fP, | |
465 | +and the \fCL"\eU0001d10b"\fP code points are now correctly evaluated, | |
466 | +producing the expected output: | |
467 | +.PP | |
468 | +.RS 2n | |
469 | +.EX | |
470 | +Single wide character: 0x6C34 --> 3 bytes: 0xE6:0xB0:0xB4 | |
471 | +Surrogate pair: 0xD834:0xD834 --> 4 bytes: 0xF0:0x9D:0x84:0x8B | |
472 | +.EE | |
473 | +.RE | |
474 | +. | |
475 | +. | |
476 | +.SH SEE ALSO | |
477 | +.BR mbsinit (3), | |
478 | +and | |
479 | +.BR wcsrtombs (3) | |
480 | +. | |
481 | +. | |
482 | +.SH AUTHOR | |
483 | +This manpage was written by \%Keith\ Marshall, | |
484 | +\%<keith@users.osdn.me>, | |
485 | +to document the | |
486 | +.BR \%wcrtomb () | |
487 | +function as it has been implemented for the MinGW.org Project. | |
488 | +It may be copied, modified and redistributed, | |
489 | +without restriction of copyright, | |
490 | +provided this acknowledgement of contribution by | |
491 | +the original author remains in place. | |
492 | +. | |
493 | +.\" EOF |
@@ -0,0 +1,361 @@ | ||
1 | +.\" vim: ft=nroff | |
2 | +.TH %PAGEREF% MinGW "MinGW Programmer's Reference Manual" | |
3 | +. | |
4 | +.SH NAME | |
5 | +.B \%wcsrtombs | |
6 | +\- convert a wide character to a multibyte sequence | |
7 | +. | |
8 | +. | |
9 | +.SH SYNOPSIS | |
10 | +.B #include | |
11 | +.RB < wchar.h > | |
12 | +.PP | |
13 | +.B size_t wcsrtombs( char | |
14 | +.BI * dst , | |
15 | +.B wchar_t | |
16 | +.BI ** src , | |
17 | +.B size_t | |
18 | +.IB len , | |
19 | +.B mbstate_t | |
20 | +.BI * ps | |
21 | +.B ); | |
22 | +. | |
23 | +.IP \& -4n | |
24 | +Feature Test Macro Requirements for libmingwex: | |
25 | +.PP | |
26 | +.BR \%__MSVCRT_VERSION__ : | |
27 | +since \%mingwrt\(hy5.3, | |
28 | +if this feature test macro is | |
29 | +.IR defined , | |
30 | +with a value of | |
31 | +.I at least | |
32 | +.IR \%0x0800 , | |
33 | +(corresponding to the symbolic constant, | |
34 | +.BR \%__MSCVR80_DLL , | |
35 | +and thus declaring intent to link with \%MSVCR80.DLL, | |
36 | +or any later version of \%Microsoft\(aqs \%non\(hyfree runtime library, | |
37 | +instead of with \%MSVCRT.DLL), | |
38 | +calls to | |
39 | +.BR \%wcsrtombs () | |
40 | +will be directed to the implementation thereof, | |
41 | +within \%Microsoft\(aqs runtime DLL. | |
42 | +. | |
43 | +.PP | |
44 | +.BR \%_ISOC99_SOURCE , | |
45 | +.BR \%_ISOC11_SOURCE : | |
46 | +since \%mingwrt\(hy5.3.1, | |
47 | +when linking with \%MSVCRT.DLL, | |
48 | +or when | |
49 | +.B \%__MSVCRT_VERSION__ | |
50 | +is either | |
51 | +.IR undefined , | |
52 | +or is | |
53 | +.I defined | |
54 | +with any value which is | |
55 | +.I less than | |
56 | +.IR \%0x0800 , | |
57 | +(thus denying intent to link with \%MSVCR80.DLL, | |
58 | +or any later \%non\(hyfree version of Microsoft\(aqs runtime library), | |
59 | +.I explicitly | |
60 | +defining either of these feature test macros | |
61 | +will cause any call to | |
62 | +.BR \%wcsrtombs () | |
63 | +to be directed to the | |
64 | +.I \%libmingwex | |
65 | +implementation; | |
66 | +if neither macro is defined, | |
67 | +calls to | |
68 | +.BR \%wcsrtombs () | |
69 | +will be directed to Microsoft\(aqs runtime implementation, | |
70 | +if it is available, | |
71 | +otherwise falling back to the | |
72 | +.I \%libmingwex | |
73 | +implementation. | |
74 | +. | |
75 | +.PP | |
76 | +Prior to \%mingwrt\(hy5.3, | |
77 | +none of the above feature test macros have any effect on | |
78 | +.BR \%wcsrtombs (); | |
79 | +all calls will be directed to the | |
80 | +.I \%libmingwex | |
81 | +implementation. | |
82 | +. | |
83 | +. | |
84 | +.SH DESCRIPTION | |
85 | +The | |
86 | +.BR \%wcsrtombs () | |
87 | +function converts a sequence of wide characters from | |
88 | +the array which is indirectly pointed to by | |
89 | +.IR src , | |
90 | +to a corresponding multibyte character sequence in | |
91 | +the codeset which is associated with the | |
92 | +.B \%LC_CTYPE | |
93 | +category of the active process locale, | |
94 | +beginning in the conversion state which is represented by the | |
95 | +.B \%mbstate_t | |
96 | +object at | |
97 | +.IR *ps ; | |
98 | +each wide character is converted, | |
99 | +as if by calling the | |
100 | +.BR \%wcrtomb (3) | |
101 | +function. | |
102 | +. | |
103 | +.PP | |
104 | +Conversion continues until: | |
105 | +.RS 2n | |
106 | +.ll -2n | |
107 | +.IP \(bu 2n | |
108 | +A wide character which is invalid in its own context is encountered. | |
109 | +. | |
110 | +.IP \(bu 2n | |
111 | +A wide character which does not have a valid representation within | |
112 | +the target multibyte codeset is encountered. | |
113 | +. | |
114 | +.IP \(bu 2n | |
115 | +The NUL wide character is encountered, | |
116 | +while in the initial conversion state. | |
117 | +. | |
118 | +.IP \(bu 2n | |
119 | +The | |
120 | +.I dst | |
121 | +argument is not a NULL pointer, | |
122 | +and a wide character is encountered for which | |
123 | +the converted length would cause the aggregate length | |
124 | +of the converted multibyte character string to exceed | |
125 | +the limit specified by the | |
126 | +.I len | |
127 | +argument. | |
128 | +.ll +2n | |
129 | +.RE | |
130 | +. | |
131 | +.PP | |
132 | +If | |
133 | +.I dst | |
134 | +is | |
135 | +.I not | |
136 | +a NULL pointer, | |
137 | +the multibyte character string resulting from successful conversion, | |
138 | +up to a maximum of | |
139 | +.I len | |
140 | +bytes, | |
141 | +is stored in the multibyte array starting at | |
142 | +.IR dst . | |
143 | +If the conversion is NUL terminated, | |
144 | +the wide character string reference pointed to by | |
145 | +.I src | |
146 | +is replaced by a NULL pointer; | |
147 | +otherwise it is updated to point to the address immediately | |
148 | +following that of the last wide character converted. | |
149 | +. | |
150 | +.PP | |
151 | +If | |
152 | +.I dst | |
153 | +is a NULL pointer, | |
154 | +the aggregate count of bytes required | |
155 | +to represent the conversion is accumulated, | |
156 | +until any one of the preceding termination conditions is encountered; | |
157 | +the | |
158 | +.I len | |
159 | +argument, | |
160 | +and the termination condition which is dependent upon it, | |
161 | +is ignored, | |
162 | +and the conversion is not stored. | |
163 | +. | |
164 | +.PP | |
165 | +If | |
166 | +.I ps | |
167 | +is a NULL pointer, | |
168 | +the | |
169 | +.BR \%wcsrtombs () | |
170 | +function uses a static internal | |
171 | +.B \%mbstate_t | |
172 | +object, | |
173 | +which is known only to, | |
174 | +and visible only within the scope of execution of, | |
175 | +the | |
176 | +.BR \%wcsrtombs () | |
177 | +function itself. | |
178 | +. | |
179 | +.PP | |
180 | +Following a successful conversion, | |
181 | +the | |
182 | +.B \%mbstate_t | |
183 | +object at | |
184 | +.IR *ps , | |
185 | +or the internal | |
186 | +.B \%mbstate_t | |
187 | +object if appropriate, | |
188 | +is reset to the initial conversion state. | |
189 | +. | |
190 | +. | |
191 | +.SH RETURN VALUE | |
192 | +When conversion is successful, | |
193 | +and | |
194 | +.I dst | |
195 | +is | |
196 | +.I not | |
197 | +a NULL pointer, | |
198 | +the | |
199 | +.BR \%wcsrtombs () | |
200 | +function returns the number of bytes stored at | |
201 | +.IR dst , | |
202 | +to represent the resulting multibyte character sequence, | |
203 | +.I excluding | |
204 | +the terminating NUL, | |
205 | +(if any). | |
206 | +. | |
207 | +.PP | |
208 | +Conversely, | |
209 | +when conversion is successful, | |
210 | +but | |
211 | +.I dst is | |
212 | +a NULL pointer, | |
213 | +the | |
214 | +.BR \%wcsrtombs () | |
215 | +function returns the number of bytes which would be required | |
216 | +to store the entire multibyte character string resulting from | |
217 | +the successful conversion, | |
218 | +.I excluding | |
219 | +the terminating NUL. | |
220 | +. | |
221 | +. | |
222 | +.SH ERROR CONDITIONS | |
223 | +If conversion is unsuccessful, | |
224 | +.I \%errno | |
225 | +is set to | |
226 | +.BR \%EILSEQ , | |
227 | +the | |
228 | +.BR wcsrtombs () | |
229 | +function returns | |
230 | +.IR (size_t)(\-1) , | |
231 | +and the conversion state is unspecified. | |
232 | +. | |
233 | +. | |
234 | +.SH STANDARDS CONFORMANCE | |
235 | +Except in respect of its extended provision for handling of | |
236 | +.IR surrogate\ pairs , | |
237 | +and to the extent that it may be affected by limitations | |
238 | +of the underlying \%MS\(hyWindows API, | |
239 | +the | |
240 | +.I \%libmingwex | |
241 | +implementation of | |
242 | +.BR \%wcsrtombs () | |
243 | +conforms generally to | |
244 | +.BR \%ISO\(hyC99 , | |
245 | +.BR \%POSIX.1\(hy2001 , | |
246 | +and | |
247 | +.BR \%POSIX.1\(hy2008 ; | |
248 | +(prior to \%mingwrt\-5.3, | |
249 | +and in those cases where calls may be delegated | |
250 | +to a Microsoft runtime DLL implementation, | |
251 | +this level of conformity may not be achieved). | |
252 | +. | |
253 | +. | |
254 | +.\"SH EXAMPLE | |
255 | +. | |
256 | +. | |
257 | +.SH CAVEATS AND BUGS | |
258 | +Due to a documented limitation of Microsoft\(aqs | |
259 | +.BR \%setlocale () | |
260 | +function implementation, | |
261 | +it is not possible to directly select an active locale, | |
262 | +in which the codeset is represented by any multibyte | |
263 | +character sequence with an effective | |
264 | +.B \%MB_CUR_MAX | |
265 | +of more than two bytes. | |
266 | +Prior to \%mingwrt\(hy5.3, | |
267 | +this limitation precludes the use of | |
268 | +.BR \%wcsrtombs () | |
269 | +to convert to any codeset with | |
270 | +.B \%MB_CUR_MAX | |
271 | +greater than two bytes, | |
272 | +(such as | |
273 | +.BR \%UTF\(hy8 ). | |
274 | +From \%mingwrt\(hy5.3 onward, | |
275 | +the MinGW.org implementation of | |
276 | +.BR \%wcsrtombs () | |
277 | +mitigates this limitation by assignment of the codeset | |
278 | +from the | |
279 | +.B \%LC_CTYPE | |
280 | +environment variable, | |
281 | +provided the system default has been previously activated | |
282 | +for the | |
283 | +.B \%LC_CTYPE | |
284 | +locale category; | |
285 | +e.g.\ execution of: | |
286 | +.PP | |
287 | +.RS 4n | |
288 | +.EX | |
289 | +#define _ISOC99_SOURCE | |
290 | + | |
291 | +#include <stdio.h> | |
292 | +#include <stdlib.h> | |
293 | +#include <locale.h> | |
294 | +#include <wchar.h> | |
295 | + | |
296 | +void print_conv( const wchar_t * ); | |
297 | + | |
298 | +int main() | |
299 | +{ | |
300 | + setlocale( LC_CTYPE, "" ); | |
301 | + putenv( "LC_CTYPE=en_GB.65001" ); | |
302 | + print_conv( L"\eu6c34\eU0001d10b" ); | |
303 | + return 0; | |
304 | +} | |
305 | + | |
306 | +void print_conv( const wchar_t *wcs ) | |
307 | +{ | |
308 | + size_t len; | |
309 | + if( (len = 1 + wcsrtombs( NULL, &wcs, 0, NULL )) > 0 ) | |
310 | + { | |
311 | + const wchar_t *wc = wcs; | |
312 | + size_t n = 1 + wcslen( wcs ); | |
313 | + unsigned char mbs[len], *mb = mbs; | |
314 | + printf( "UTF-16: %u value%s: ", n, (n == 1) ? "" : "s" ); | |
315 | + do { printf( "0x%04X%c", *wc, (*wc == L'\e0') ? '\en' : ':' ); | |
316 | + } while( *p++ != L'\e0' ); | |
317 | + printf( "UTF-8: %u byte%s: ", | |
318 | + 1 + wcsrtombs( mbs, &wcs, len, NULL ), | |
319 | + (len == 1) ? "" : "s" | |
320 | + ); | |
321 | + do { printf( "0x%02X%s", *mb, (*mb == '\e0') ? '\en' : ':' ); | |
322 | + } while( *mb++ != '\e0' ); | |
323 | + } | |
324 | + else perror( "wcsrtombs" ); | |
325 | +} | |
326 | +.EE | |
327 | +.RE | |
328 | +.PP | |
329 | +will select | |
330 | +.B \%UTF\(hy8 | |
331 | +as the target codeset, | |
332 | +then convert the \fC\%L"\eu6c34\eU0001d10b"\fP | |
333 | +wide character string, | |
334 | +resulting in the output: | |
335 | +.PP | |
336 | +.RS 4n | |
337 | +.EX | |
338 | +UTF-16: 4 values: 0x6C34:0xD834:0xDD0B:0x0000 | |
339 | +UTF-8: 8 bytes: 0xE6:0xB0:0xB4:0xF0:0x9D:0x84:0x8B:0x00 | |
340 | +.EE | |
341 | +.RE | |
342 | +. | |
343 | +. | |
344 | +.SH SEE ALSO | |
345 | +.BR mbsinit (3), | |
346 | +and | |
347 | +.BR wcrtomb (3) | |
348 | +. | |
349 | +. | |
350 | +.SH AUTHOR | |
351 | +This manpage was written by \%Keith\ Marshall, | |
352 | +\%<keith@users.osdn.me>, | |
353 | +to document the | |
354 | +.BR \%wcsrtombs () | |
355 | +function as it has been implemented for the MinGW.org Project. | |
356 | +It may be copied, modified and redistributed, | |
357 | +without restriction of copyright, | |
358 | +provided this acknowledgement of contribution by | |
359 | +the original author remains in place. | |
360 | +. | |
361 | +.\" EOF |
@@ -0,0 +1,174 @@ | ||
1 | +.\" vim: ft=nroff | |
2 | +.TH %PAGEREF% MinGW "MinGW Programmer's Reference Manual" | |
3 | +. | |
4 | +.SH NAME | |
5 | +.B \%wctob | |
6 | +\- convert a wide character to a single byte | |
7 | +. | |
8 | +. | |
9 | +.SH SYNOPSIS | |
10 | +.B #include | |
11 | +.RB < stdio.h > | |
12 | +.br | |
13 | +.B #include | |
14 | +.RB < wchar.h > | |
15 | +.PP | |
16 | +.B int wctob( wint_t | |
17 | +.I c | |
18 | +.B ); | |
19 | +. | |
20 | +.IP \& -4n | |
21 | +Feature Test Macro Requirements for libmingwex: | |
22 | +.PP | |
23 | +.BR \%__MSVCRT_VERSION__ : | |
24 | +since \%mingwrt\(hy5.3, | |
25 | +if this feature test macro is | |
26 | +.IR defined , | |
27 | +with a value of | |
28 | +.I at least | |
29 | +.IR \%0x0800 , | |
30 | +(corresponding to the symbolic constant, | |
31 | +.BR \%__MSCVR80_DLL , | |
32 | +and thus declaring intent to link with \%MSVCR80.DLL, | |
33 | +or any later version of \%Microsoft\(aqs \%non\(hyfree runtime library, | |
34 | +instead of with \%MSVCRT.DLL), | |
35 | +calls to | |
36 | +.BR \%wctob () | |
37 | +will be directed to the implementation thereof, | |
38 | +within \%Microsoft\(aqs runtime DLL. | |
39 | +. | |
40 | +.PP | |
41 | +.BR \%_ISOC99_SOURCE , | |
42 | +.BR \%_ISOC11_SOURCE : | |
43 | +since \%mingwrt\(hy5.3.1, | |
44 | +when linking with \%MSVCRT.DLL, | |
45 | +or when | |
46 | +.B \%__MSVCRT_VERSION__ | |
47 | +is either | |
48 | +.IR undefined , | |
49 | +or is | |
50 | +.I defined | |
51 | +with any value which is | |
52 | +.I less than | |
53 | +.IR \%0x0800 , | |
54 | +(thus denying intent to link with \%MSVCR80.DLL, | |
55 | +or any later \%non\(hyfree version of Microsoft\(aqs runtime library), | |
56 | +.I explicitly | |
57 | +defining either of these feature test macros | |
58 | +will cause any call to | |
59 | +.BR \%wctob () | |
60 | +to be directed to the | |
61 | +.I \%libmingwex | |
62 | +implementation; | |
63 | +if neither macro is defined, | |
64 | +calls to | |
65 | +.BR \%wctob () | |
66 | +will be directed to Microsoft\(aqs runtime implementation, | |
67 | +if it is available, | |
68 | +otherwise falling back to the | |
69 | +.I \%libmingwex | |
70 | +implementation. | |
71 | +. | |
72 | +.PP | |
73 | +Prior to \%mingwrt\(hy5.3, | |
74 | +none of the above feature test macros have any effect on | |
75 | +.BR \%wctob (); | |
76 | +all calls will be directed to the | |
77 | +.I \%libmingwex | |
78 | +implementation. | |
79 | +. | |
80 | +. | |
81 | +.SH DESCRIPTION | |
82 | +The | |
83 | +.BR \%wctob () | |
84 | +function converts the wide character, | |
85 | +represented by | |
86 | +.IR c , | |
87 | +to a multibyte character sequence | |
88 | +in the codeset which is associated with the | |
89 | +.B \%LC_CTYPE | |
90 | +category of the active process locale. | |
91 | +Provided the entire conversion can be accommodated | |
92 | +within a single byte, | |
93 | +the value of that byte, | |
94 | +interpreted as an | |
95 | +.IR unsigned\ char , | |
96 | +and cast to an | |
97 | +.IR int , | |
98 | +is returned; | |
99 | +otherwise, | |
100 | +.B EOF | |
101 | +is returned. | |
102 | +. | |
103 | +. | |
104 | +.SH RETURN VALUE | |
105 | +If the conversion of | |
106 | +.IR c , | |
107 | +to a multibyte character sequence, | |
108 | +in its entirety, | |
109 | +occupies exactly | |
110 | +.I one | |
111 | +byte, | |
112 | +the value of that byte, | |
113 | +interpreted as an | |
114 | +.IR unsigned\ char , | |
115 | +and cast to an | |
116 | +.IR int , | |
117 | +is returned; | |
118 | +otherwise, | |
119 | +.B EOF | |
120 | +is returned. | |
121 | +. | |
122 | +. | |
123 | +.SH ERROR CONDITIONS | |
124 | +No error conditions are defined. | |
125 | +. | |
126 | +. | |
127 | +.SH STANDARDS CONFORMANCE | |
128 | +Except to the extent that it may be affected by limitations | |
129 | +of the underlying \%MS\(hyWindows API, | |
130 | +the | |
131 | +.I \%libmingwex | |
132 | +implementation of | |
133 | +.BR \%wctob () | |
134 | +conforms generally to | |
135 | +.BR \%ISO\(hyC99 , | |
136 | +.BR \%POSIX.1\(hy2001 , | |
137 | +and | |
138 | +.BR \%POSIX.1\(hy2008 ; | |
139 | +(prior to \%mingwrt\(hy5.3, | |
140 | +and in those cases where calls may be delegated | |
141 | +to a Microsoft runtime DLL implementation, | |
142 | +this level of conformity may not be achieved). | |
143 | +. | |
144 | +. | |
145 | +.\"SH EXAMPLE | |
146 | +. | |
147 | +. | |
148 | +.SH CAVEATS AND BUGS | |
149 | +Use of the | |
150 | +.BR \%wctob () | |
151 | +function is | |
152 | +.IR discouraged ; | |
153 | +it serves no purpose which may not be better served by the | |
154 | +.BR \%wcrtomb (3) | |
155 | +function, | |
156 | +which should be considered as a preferred alternative. | |
157 | +. | |
158 | +. | |
159 | +.SH SEE ALSO | |
160 | +.BR wcrtomb (3) | |
161 | +. | |
162 | +. | |
163 | +.SH AUTHOR | |
164 | +This manpage was written by \%Keith\ Marshall, | |
165 | +\%<keith@users.osdn.me>, | |
166 | +to document the | |
167 | +.BR \%wctob () | |
168 | +function as it has been implemented for the MinGW.org Project. | |
169 | +It may be copied, modified and redistributed, | |
170 | +without restriction of copyright, | |
171 | +provided this acknowledgement of contribution by | |
172 | +the original author remains in place. | |
173 | +. | |
174 | +.\" EOF |