View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0031113 | Open CASCADE | OCCT:Foundation Classes | public | 2019-10-29 18:40 | 2023-03-06 19:23 |
Reporter | Assigned To | dpasukhi | |||
Priority | normal | Severity | minor | ||
Status | new | Resolution | open | ||
Target Version | Unscheduled | ||||
Summary | 0031113: Foundation Classes - TCollection_ExtendedString constructor from CString is confusing | ||||
Description | Constructor of TCollection_ExtendedString constructor from CString is confusing: its second option isMultiByte is false by default, which results in corruption of string if argument is in UTF-8 encoding (which is default in most modern applications, and in particular in OCCT when using TCollection_AsciiString). To protect against this issue, we can: - make constructor explicit - make second argument non-default | ||||
Tags | No tags attached. | ||||
Test case number | |||||
related to | 0026749 | closed | Community | TObj_Assistant::FindModel fails to find model with unicode name. |
|
Another solution, not requiring porting of existing code (but potentially breaking existing code if it relies on old behavior), is to change default to True. We shall see first how much this constructor is used. |
|
I have tried building OCCT with parameter of constructor made non-default. This results in over 1000 errors, most of which have nothing to do with non-Ascii symbols and indicate inappropriate use of ExtendedString: - In OCAF persistence, for generation of error messages combining string literals and parameters (like "Cannot retrieve the last index for the attribute..."). Use of TCollection_AsciiString here would be more appropriate. - In TKOpenGl, for sending messages via OpenGl_Context::PushMessage(). As all the messages I have seen are actually plain Ascii strings, I deem TCollection_AsciiString would be more appropriate as well. - In constructor of Graphic3d_AxisAspect (name argument). TCollection_AsciiString can be also used here I suppose. - In AIS_InteractiveContext::Status(), OpenGl_View::safeFailBack() Besides, I wonder why Graphic3d_GraduatedTrihedron.hxx is referenced from almost any part of OCCT code... There are also some places where this constructor is used to put Ascii string literal to ExtendedString where it cannot be replaced easily by AsciiString. Most notable case is XCAF classes setting default names to objects. Another places is AIS_ColorScale::TextHeight() and TextSize() called with argument "". The same applies to related DRAW commands. Places where this constructor accepts potentially non-Ascii string are suffering from the same problem described in the issue description (Unicode text will be broken on conversion): - DRAW command vcolorscale (title argument) - DRAW commands working with strings in DDataStd_BasicCommands.cxx and other OCAF-related commands; note that commands SetComment and SetName use Standard_True argument to construct TCollection_ExtendedString, but others do not |
|
Alas, removing default value for isMultiByte argument does not allow finding all uses of that constructor -- direct initialisations of TCollection_ExtendedString from Standard_CSTring (including string literals) are then resolved to conversion vis TCollection_AsciiString, which has the same effect as isMultiByte set to True. |
|
In order to identify places where TCollection_ExtendedString is initialized by char* which may probably be Unicode I have modified its constructor as follows:Standard_DEPRECATED("Extended string from char*") explicit Standard_EXPORT TCollection_ExtendedString(const Standard_CString theString, const Standard_Boolean isMultiByte = Standard_True); template<size_t N> TCollection_ExtendedString (const char (&theString)[N]) : TCollection_ExtendedString (theString, Standard_True) { } Note that the last constructor handles initializations from C string literals which we assume never contain Unicode symbols across OCCT. The resulting places are: * In XmlM* packages, error messages generated if conversion fails can contain string converted from LDOMString, it is not clear if it can contain Unicode or not * In OpenGl_Context::init(), strings returned by glGetString() are used * In some places, e.g. BinDrivers_DocumentStorageDriver::WriteShapeSection(), exception message is used * DRAW commands: vcolorscale for title, commands in DDocStd, DNaming, DDataStd, TObjDRAW packages (document name, format, file path, etc.), XDEDRAW package (XNote* commands, XCAF document open / save commands), some commands in TKQADraw * In Message_MsgFile.cxx, static getString() applied e.g. to file names * In LDOMBasicString.cxx, when converting plain strings (without #feff prefix) to ExtendedString It looks like in all those places use of Standard_True is at least safe and most likely more correct, so this change can enable possibility to use Unicode where it was not possible before. |
|
Branch CR31113 has been created by abv. SHA-1: 1e63a61a4ee9c619ab416db901f153da8e08d10e Detailed log of new commits: Author: abv Date: Sat Oct 24 22:46:35 2020 +0300 0031113: Foundation Classes - TCollection_ExtendedString constructor from CString is confusing In constructor of TCollection_ExtendedString from Standard_CSTring (const char*), default value for the second argument (isMultiByte) is changed from False to True, for consistency with constructor from TCollection_AsciiString and adopted approach of using UTF-8 encoding for Unicode strings across OCCT. |
Date Modified | Username | Field | Change |
---|---|---|---|
2019-10-29 18:40 |
|
New Issue | |
2019-10-29 18:40 |
|
Assigned To | => abv |
2020-09-11 16:13 |
|
Target Version | 7.5.0 => 7.6.0 |
2020-10-22 10:36 |
|
Note Added: 0096128 | |
2020-10-22 10:36 |
|
Target Version | 7.6.0 => 7.5.0 |
2020-10-24 11:17 |
|
Note Added: 0096187 | |
2020-10-24 20:25 |
|
Note Added: 0096243 | |
2020-10-24 22:38 |
|
Note Added: 0096246 | |
2020-10-24 22:43 | git | Note Added: 0096247 | |
2020-10-25 10:02 |
|
Target Version | 7.5.0 => 7.6.0 |
2020-10-25 20:30 |
|
Relationship added | related to 0026749 |
2021-09-20 10:27 | kgv | Target Version | 7.6.0 => 7.7.0 |
2022-10-24 10:40 |
|
Target Version | 7.7.0 => 7.8.0 |
2023-01-17 20:02 | ebelouso | Assigned To | abv => akaftasev |
2023-01-18 12:13 | dpasukhi | Assigned To | akaftasev => dpasukhi |
2023-03-06 19:23 | ebelouso | Target Version | 7.8.0 => Unscheduled |