The first iconv library design issue arises when considering the following two design approaches:
It’s obvious, that we have tradeoff between commonality/flexibility and efficiency: the first method is more efficient since it converts directly; however, it isn’t so flexible since for each encoding pair a distinct module is needed.
The Newlib iconv model uses the second method and always converts through the 32-bit UCS but its design also allows one to write specialized conversion modules if the conversion speed is critical.
The second design issue is how to break down (decompose) encodings. The Newlib iconv library uses the fact that any encoding may be considered as one or more CCS plus a CES. It also decomposes its conversion modules on CES converter plus one or more CCS tables. CCS tables map CCS to UCS and vice versa; the CES converters map CCS to the encoding and vice versa.
As the example, let’s consider the conversion from the big5 encoding to the EUC-TW encoding. The big5 encoding may be decomposed to the ASCII and BIG5 CCS-es plus the BIG5 CES. EUC-TW may be decomposed on the CNS11643_PLANE1, CNS11643_PLANE2, and CNS11643_PLANE14 CCS-es plus the EUC CES.
The euc_jp -> big5 conversion is performed as follows:
Analogously, the backward conversion is performed as follows:
Note, the above is just an example and real names (which are implemented in the Newlib iconv) of the CES converters and the CCS tables are slightly different.
The third design issue also relates to flexibility. Obviously, it isn’t desirable to always link all the CES converters and the CCS tables to the library but instead, we want to be able to load the needed converters and tables dynamically on demand. This isn’t a problem on "big" machines such as a PC, but it may be very problematical within "small" embedded systems.
Since the CCS tables are just data, it is possible to load them dynamically from external files. The CES converters, on the other hand are algorithms with some code so a dynamic library loading capability is required.
Apart from possible restrictions applied by embedded systems (small RAM for example), Newlib itself has no dynamic library support and therefore, all the CES converters which will ever be used must be linked into the library. However, loading of the dynamic CCS tables is possible and is implemented in the Newlib iconv library. It may be enabled via the Newlib configure script options.
The next design issue is fine-tuning the iconv library configuration. One important ability is for iconv to not link all it’s converters and tables (if dynamic loading is not enabled) but instead, enable only those encodings which are specified at configuration time (see the section about the configure script options).
In addition, the Newlib iconv library configure options distinguish between conversion directions. This means that not only are supported encodings selectable, the conversion direction is as well. For example, if user wants the configuration which allows conversions from UTF-8 to UTF-16 and doesn’t plan using the "UTF-16 to UTF-8" conversions, he or she can enable only this conversion direction (i.e., no "UTF-16 -> UTF-8"-related code will be included) thus, saving some memory (note, that such technique allows to exclude one half of a CCS table from linking which may be big enough).
One more design aspect are the speed- and size- optimized tables. Users can select between them using configure script options. The speed-optimized CCS tables are the same as the size-optimized ones in case of 8-bit CCS (e.g.m KOI8-R), but for 16-bit CCS-es the size-optimized CCS tables may be 1.5 to 2 times less then the speed-optimized ones. On the other hand, conversion with speed tables is several times faster.
Its worth to stress that the new encoding support can’t be dynamically added into an already compiled Newlib library, even if it needs only an additional CCS table and iconv is configured to use the external files with CCS tables (this isn’t the fundamental restriction and the possibility to add new Table-based encoding support dynamically, by means of just adding new .cct file, may be easily added).
Theoretically, the compiled-in CCS tables should be more appropriate for embedded systems than dynamically loaded CCS tables. This is because the compiled-in tables are read-only and can be placed in ROM whereas dynamic loading requires RAM. Moreover, in the current iconv implementation, a distinct copy of the dynamic CCS file is loaded for each opened iconv descriptor even in case of the same encoding. This means, for example, that if two iconv descriptors for "KOI8-R -> UCS-4BE" and "KOI8-R -> UTF-16BE" are opened, two copies of koi8-r .cct file will be loaded (actually, iconv loads only the needed part of these files). On the other hand, in the case of compiled-in CCS tables, there will always be only one copy.