MantisBT
Mantis Bug Tracker Workflow

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0031113Open CASCADE[OCCT] OCCT:Foundation Classespublic2019-10-29 18:402020-10-29 16:32
Reporterabv 
Assigned Toabv 
PrioritynormalSeverityminor 
StatusnewResolutionopen 
PlatformOSOS Version
Product Version 
Target Version[OCCT] 7.6.0*Fixed in Version 
Summary0031113: Foundation Classes - TCollection_ExtendedString constructor from CString is confusing
DescriptionConstructor of TCollection_ExtendedString constructor from CString is confusing: its second option isMultiByte is false by default, which results in corruption of string if argument is in UTF-8 encoding (which is default in most modern applications, and in particular in OCCT when using TCollection_AsciiString).

To protect against this issue, we can:
- make constructor explicit
- make second argument non-default
TagsNo tags attached.
Test case number
Attached Files

- Relationships
related to 0026749verifiedabv Community TObj_Assistant::FindModel fails to find model with unicode name. 

-  Notes
(0096128)
abv (manager)
2020-10-22 10:36

Another solution, not requiring porting of existing code (but potentially breaking existing code if it relies on old behavior), is to change default to True. We shall see first how much this constructor is used.
(0096187)
abv (manager)
2020-10-24 11:17

I have tried building OCCT with parameter of constructor made non-default.
This results in over 1000 errors, most of which have nothing to do with non-Ascii symbols and indicate inappropriate use of ExtendedString:

- In OCAF persistence, for generation of error messages combining string literals and parameters (like "Cannot retrieve the last index for the attribute..."). Use of TCollection_AsciiString here would be more appropriate.

- In TKOpenGl, for sending messages via OpenGl_Context::PushMessage(). As all the messages I have seen are actually plain Ascii strings, I deem TCollection_AsciiString would be more appropriate as well.

- In constructor of Graphic3d_AxisAspect (name argument). TCollection_AsciiString can be also used here I suppose.

- In AIS_InteractiveContext::Status(), OpenGl_View::safeFailBack()

Besides, I wonder why Graphic3d_GraduatedTrihedron.hxx is referenced from almost any part of OCCT code...

There are also some places where this constructor is used to put Ascii string literal to ExtendedString where it cannot be replaced easily by AsciiString. Most notable case is XCAF classes setting default names to objects. Another places is AIS_ColorScale::TextHeight() and TextSize() called with argument "". The same applies to related DRAW commands.

Places where this constructor accepts potentially non-Ascii string are suffering from the same problem described in the issue description (Unicode text will be broken on conversion):

- DRAW command vcolorscale (title argument)

- DRAW commands working with strings in DDataStd_BasicCommands.cxx and other OCAF-related commands; note that commands SetComment and SetName use Standard_True argument to construct TCollection_ExtendedString, but others do not
(0096243)
abv (manager)
2020-10-24 20:25

Alas, removing default value for isMultiByte argument does not allow finding all uses of that constructor -- direct initialisations of TCollection_ExtendedString from Standard_CSTring (including string literals) are then resolved to conversion vis TCollection_AsciiString, which has the same effect as isMultiByte set to True.
(0096246)
abv (manager)
2020-10-24 22:38

In order to identify places where TCollection_ExtendedString is initialized by char* which may probably be Unicode I have modified its constructor as follows:

  Standard_DEPRECATED("Extended string from char*")
  explicit Standard_EXPORT TCollection_ExtendedString(const Standard_CString theString, 
                                             const Standard_Boolean isMultiByte = Standard_True);

  template<size_t N>
  TCollection_ExtendedString (const char (&theString)[N]) 
    : TCollection_ExtendedString (theString, Standard_True)
  { 
  }


Note that the last constructor handles initializations from C string literals which we assume never contain Unicode symbols across OCCT.

The resulting places are:

* In XmlM* packages, error messages generated if conversion fails can contain string converted from LDOMString, it is not clear if it can contain Unicode or not

* In OpenGl_Context::init(), strings returned by glGetString() are used

* In some places, e.g. BinDrivers_DocumentStorageDriver::WriteShapeSection(), exception message is used

* DRAW commands: vcolorscale for title, commands in DDocStd, DNaming, DDataStd, TObjDRAW packages (document name, format, file path, etc.), XDEDRAW package (XNote* commands, XCAF document open / save commands), some commands in TKQADraw

* In Message_MsgFile.cxx, static getString() applied e.g. to file names

* In LDOMBasicString.cxx, when converting plain strings (without #feff prefix) to ExtendedString

It looks like in all those places use of Standard_True is at least safe and most likely more correct, so this change can enable possibility to use Unicode where it was not possible before.
(0096247)
git (administrator)
2020-10-24 22:43

Branch CR31113 has been created by abv.

SHA-1: 1e63a61a4ee9c619ab416db901f153da8e08d10e


Detailed log of new commits:

Author: abv
Date: Sat Oct 24 22:46:35 2020 +0300

    0031113: Foundation Classes - TCollection_ExtendedString constructor from CString is confusing
    
    In constructor of TCollection_ExtendedString from Standard_CSTring (const char*), default value for the second argument (isMultiByte) is changed from False to True, for consistency with constructor from TCollection_AsciiString and adopted approach of using UTF-8 encoding for Unicode strings across OCCT.

- Issue History
Date Modified Username Field Change
2019-10-29 18:40 abv New Issue
2019-10-29 18:40 abv Assigned To => abv
2020-09-11 16:13 utverdov Target Version 7.5.0 => 7.6.0*
2020-10-22 10:36 abv Note Added: 0096128
2020-10-22 10:36 abv Target Version 7.6.0* => 7.5.0
2020-10-24 11:17 abv Note Added: 0096187
2020-10-24 20:25 abv Note Added: 0096243
2020-10-24 22:38 abv Note Added: 0096246
2020-10-24 22:43 git Note Added: 0096247
2020-10-25 10:02 abv Target Version 7.5.0 => 7.6.0*
2020-10-25 20:30 abv Relationship added related to 0026749
2020-10-29 16:32 abv Relationship added related to 0031878


Copyright © 2000 - 2020 MantisBT Team
Powered by Mantis Bugtracker