View Issue Details

IDProjectCategoryView StatusLast Update
0031113Open CASCADEOCCT:Foundation Classespublic2023-03-06 19:23
ReporterabvAssigned Todpasukhi  
PrioritynormalSeverityminor 
Status newResolutionopen 
Target VersionUnscheduled 
Summary0031113: Foundation Classes - TCollection_ExtendedString constructor from CString is confusing
DescriptionConstructor of TCollection_ExtendedString constructor from CString is confusing: its second option isMultiByte is false by default, which results in corruption of string if argument is in UTF-8 encoding (which is default in most modern applications, and in particular in OCCT when using TCollection_AsciiString).

To protect against this issue, we can:
- make constructor explicit
- make second argument non-default
TagsNo tags attached.
Test case number

Relationships

related to 0026749 closedabv Community TObj_Assistant::FindModel fails to find model with unicode name. 

Activities

abv

2020-10-22 10:36

manager   ~0096128

Another solution, not requiring porting of existing code (but potentially breaking existing code if it relies on old behavior), is to change default to True. We shall see first how much this constructor is used.

abv

2020-10-24 11:17

manager   ~0096187

I have tried building OCCT with parameter of constructor made non-default.
This results in over 1000 errors, most of which have nothing to do with non-Ascii symbols and indicate inappropriate use of ExtendedString:

- In OCAF persistence, for generation of error messages combining string literals and parameters (like "Cannot retrieve the last index for the attribute..."). Use of TCollection_AsciiString here would be more appropriate.

- In TKOpenGl, for sending messages via OpenGl_Context::PushMessage(). As all the messages I have seen are actually plain Ascii strings, I deem TCollection_AsciiString would be more appropriate as well.

- In constructor of Graphic3d_AxisAspect (name argument). TCollection_AsciiString can be also used here I suppose.

- In AIS_InteractiveContext::Status(), OpenGl_View::safeFailBack()

Besides, I wonder why Graphic3d_GraduatedTrihedron.hxx is referenced from almost any part of OCCT code...

There are also some places where this constructor is used to put Ascii string literal to ExtendedString where it cannot be replaced easily by AsciiString. Most notable case is XCAF classes setting default names to objects. Another places is AIS_ColorScale::TextHeight() and TextSize() called with argument "". The same applies to related DRAW commands.

Places where this constructor accepts potentially non-Ascii string are suffering from the same problem described in the issue description (Unicode text will be broken on conversion):

- DRAW command vcolorscale (title argument)

- DRAW commands working with strings in DDataStd_BasicCommands.cxx and other OCAF-related commands; note that commands SetComment and SetName use Standard_True argument to construct TCollection_ExtendedString, but others do not

abv

2020-10-24 20:25

manager   ~0096243

Alas, removing default value for isMultiByte argument does not allow finding all uses of that constructor -- direct initialisations of TCollection_ExtendedString from Standard_CSTring (including string literals) are then resolved to conversion vis TCollection_AsciiString, which has the same effect as isMultiByte set to True.

abv

2020-10-24 22:38

manager   ~0096246

In order to identify places where TCollection_ExtendedString is initialized by char* which may probably be Unicode I have modified its constructor as follows:

  Standard_DEPRECATED("Extended string from char*")
  explicit Standard_EXPORT TCollection_ExtendedString(const Standard_CString theString, 
                                             const Standard_Boolean isMultiByte = Standard_True);

  template<size_t N>
  TCollection_ExtendedString (const char (&theString)[N]) 
    : TCollection_ExtendedString (theString, Standard_True)
  { 
  }


Note that the last constructor handles initializations from C string literals which we assume never contain Unicode symbols across OCCT.

The resulting places are:

* In XmlM* packages, error messages generated if conversion fails can contain string converted from LDOMString, it is not clear if it can contain Unicode or not

* In OpenGl_Context::init(), strings returned by glGetString() are used

* In some places, e.g. BinDrivers_DocumentStorageDriver::WriteShapeSection(), exception message is used

* DRAW commands: vcolorscale for title, commands in DDocStd, DNaming, DDataStd, TObjDRAW packages (document name, format, file path, etc.), XDEDRAW package (XNote* commands, XCAF document open / save commands), some commands in TKQADraw

* In Message_MsgFile.cxx, static getString() applied e.g. to file names

* In LDOMBasicString.cxx, when converting plain strings (without #feff prefix) to ExtendedString

It looks like in all those places use of Standard_True is at least safe and most likely more correct, so this change can enable possibility to use Unicode where it was not possible before.

git

2020-10-24 22:43

administrator   ~0096247

Branch CR31113 has been created by abv.

SHA-1: 1e63a61a4ee9c619ab416db901f153da8e08d10e


Detailed log of new commits:

Author: abv
Date: Sat Oct 24 22:46:35 2020 +0300

    0031113: Foundation Classes - TCollection_ExtendedString constructor from CString is confusing
    
    In constructor of TCollection_ExtendedString from Standard_CSTring (const char*), default value for the second argument (isMultiByte) is changed from False to True, for consistency with constructor from TCollection_AsciiString and adopted approach of using UTF-8 encoding for Unicode strings across OCCT.

Issue History

Date Modified Username Field Change
2019-10-29 18:40 abv New Issue
2019-10-29 18:40 abv Assigned To => abv
2020-09-11 16:13 utverdov Target Version 7.5.0 => 7.6.0
2020-10-22 10:36 abv Note Added: 0096128
2020-10-22 10:36 abv Target Version 7.6.0 => 7.5.0
2020-10-24 11:17 abv Note Added: 0096187
2020-10-24 20:25 abv Note Added: 0096243
2020-10-24 22:38 abv Note Added: 0096246
2020-10-24 22:43 git Note Added: 0096247
2020-10-25 10:02 abv Target Version 7.5.0 => 7.6.0
2020-10-25 20:30 abv Relationship added related to 0026749
2021-09-20 10:27 kgv Target Version 7.6.0 => 7.7.0
2022-10-24 10:40 szy Target Version 7.7.0 => 7.8.0
2023-01-17 20:02 ebelouso Assigned To abv => akaftasev
2023-01-18 12:13 dpasukhi Assigned To akaftasev => dpasukhi
2023-03-06 19:23 ebelouso Target Version 7.8.0 => Unscheduled