MantisBT - Open CASCADE
View Issue Details
0014673Open CASCADE[OCCT] OCCT:Foundation Classespublic2007-01-29 14:272020-12-02 17:11
[OCCT] 7.5.0[OCCT] 7.5.0 
0014673: Provide true support for Unicode symbols
This improvement is inspired by OCC14672: as it turns, regardless of the fact
that OCCT provides class TCollection_ExtendedString and uses it in many places
(e.g. OCAF), de-facto the possibility to store Unicode (or any other non-ascii)
symbols by means of that class is almost never used (at least, not in OCC).

That is obviously bad: if we provide a class capable of storing Unicode
strings, and use it in many places, the possibility to have non-ascii symbols
in it should be supported.

For moving in that direction, the following steps are proposed:

1. Provide methods to convert Unicode string in the form of
TCollection_ExtendedString to other encodings; at least most widespread Ascii-
based encodings like HTML and UTF-8 are necessary

2. Provide method to dump ExtendedString (interpreted as Unicode one) to DRAW
Tcl interpreter (which has complete support for encodings and uses internally

3. Revise the code of OCCT where ExtendedString is converted to Ascii one (they
can be found either by revising modifications made in OCC14672 or directly by
searching OCCT code for "'?'" symbol used in safe conversions, or by
Is(An)Ascii() method), for a goal to provide more adequate conversion.

a) At least, the output to DRAW can use directly Unicode encoding.

b) Another good candidate for such improvement is XML persistence (LDOM* and
other dependent packages) -- see also OCC983 and OCC5032

c) As it seems, the visualisation already contains some code handling Unicode,
though in many cases (e.g. when computing size of text) it converts it to Ascii

d) Package Resource can also be considered. Note that it already contains some
code for converting Unicode strings to and from some far-eastern encodings
(EUC, GB and ShiftJIS) -- see Resource_Unicode class
No tags attached.
related to 0031670closed bugmaster Community Data Exchange - cp1251 Cyrillic characters in STEP file 
related to 0025685assigned Vico Liang Community Application Framework - TCollection_ExtendedString unicode storage in xml document is unreadable 
parent of 0022484closed bugmaster Open CASCADE UNICODE characters support. 
parent of 0022125closed bugmaster Open CASCADE TCollection_ExtendedString: conversion from UTF-8 to unicode 
related to 0031171closed abv Open CASCADE Draw - support Unicode input / output in console on Windows 
related to 0026749closed abv Community TObj_Assistant::FindModel fails to find model with unicode name. 
cxx OSD_Path.cxx (41,193) 2011-04-01 12:50
Issue History
2011-08-02 11:23bugmasterCategoryOCCT:FDC => OCCT:Foundation Classes
2011-12-05 10:42abvFixed in VersionEMPTY =>
2011-12-05 10:42abvTarget Version => 6.5.3
2011-12-05 10:42abvDescription Updatedbug_revision_view_page.php?rev_id=1208#r1208
2011-12-05 10:43abvRelationship addedparent of 0022484
2011-12-05 10:45abvRelationship addedparent of 0022125
2012-01-12 15:16ysnNote Edited: 0012403
2012-01-12 17:27ysnNote Revision Dropped: 12403: 0002210
2012-01-12 17:28ysnNote Edited: 0012403bug_revision_view_page.php?rev_id=2239
2012-01-12 17:29ysnProjectOpen CASCADE => Internal
2012-03-12 07:45abvTarget Version6.5.3 => 6.5.4
2012-10-20 08:02abvRelationship addedparent of 0023479
2012-10-20 08:06abvNote Added: 0021865
2012-10-20 08:06abvStatusnew => feedback
2012-10-20 08:11abvTarget Version6.5.4 => Unscheduled
2012-10-26 10:55bugmasterNote Deleted: 0012404
2012-10-26 10:55bugmasterNote Deleted: 0021865
2012-10-26 10:56bugmasterProjectInternal => Open CASCADE
2012-10-30 09:19aivNote Deleted: 0012403
2012-10-30 09:29abvAssigned Tobugmaster => abv
2012-10-30 09:29abvStatusfeedback => assigned
2014-10-01 10:04abvTarget VersionUnscheduled => 7.1.0
2016-06-09 20:48Vico LiangNote Added: 0054868
2016-11-09 11:17abvTarget Version7.1.0 => 7.2.0
2017-07-27 09:43abvTarget Version7.2.0 => 7.4.0
2019-07-10 22:29abvTarget Version7.4.0 => 7.5.0
2019-11-16 15:37abvRelationship addedrelated to 0031171
2020-09-11 16:13utverdovTarget Version7.5.0 => 7.6.0*
2020-10-06 22:49abvRelationship addedrelated to 0031670
2020-10-14 12:25abvRelationship addedrelated to 0025685
2020-10-14 12:27abvRelationship addedrelated to 0000983
2020-10-14 12:27abvRelationship addedrelated to 0005032
2020-10-26 09:34gitNote Added: 0096262
2020-10-26 21:36gitNote Added: 0096283
2020-10-27 09:17gitNote Added: 0096287
2020-10-27 20:39gitNote Added: 0096316
2020-10-28 07:19abvNote Added: 0096319
2020-10-28 07:19abvStatusassigned => resolved
2020-10-28 07:19abvTarget Version7.6.0* => 7.5.0
2020-10-28 07:19abvAssigned Toabv => kgv
2020-10-28 07:20abvNote Edited: 0096319bug_revision_view_page.php?bugnote_id=96319#r23898
2020-10-28 07:22abvRelationship addedrelated to 0026749
2020-10-28 08:41gitNote Added: 0096324
2020-10-28 09:50kgvAssigned Tokgv => bugmaster
2020-10-28 09:50kgvStatusresolved => reviewed
2020-10-31 12:48bugmasterNote Added: 0096422
2020-10-31 12:48bugmasterStatusreviewed => tested
2020-10-31 12:51bugmasterTest case number => bugs/demo/bug14673_1,bug14673_2,bug14673_3,bug14673_4
2020-10-31 12:54bugmasterChangeset attached => occt master 94f16a89
2020-10-31 12:54bugmasterStatustested => verified
2020-10-31 12:54bugmasterResolutionopen => fixed
2020-11-05 15:59gitNote Added: 0096551
2020-12-02 16:22emoFixed in Version => 7.5.0
2020-12-02 17:11emoStatusverified => closed

Vico Liang   
2016-06-09 20:48   
There is a post "Better support XML Unicode storage in OCAF" [^]
2020-10-26 09:34   
Branch CR14673 has been created by abv.

SHA-1: 44e765c1c346fa6472fc77639db8214b9b019ab8

Detailed log of new commits:

Author: abv
Date: Sun Oct 25 22:10:27 2020 +0300

    0014673: Provide true support for Unicode symbols
    Method converting LDOMBasicString to TCollection_ExtendedString is corrected to consider the original string to be in UTF-8 encoding.
    Construction of TCollection_ExtendedString from plain C string is fixed to consider input string as UTF-8 in multiple other places (identified as described in notes to 0031113).
    Added tests for use of Unicode in some DRAW commands (bugs demo bug14673_*)
2020-10-26 21:36   
Branch CR14673 has been updated forcibly by abv.

SHA-1: 56721fa2df5bb0156b1b6403b807ac00a2858b87
2020-10-27 09:17   
Branch CR14673 has been updated forcibly by abv.

SHA-1: 1394bd7d74d461b859af9f6b0e41b6042e3b1003
2020-10-27 20:39   
Branch CR14673 has been updated forcibly by abv.

SHA-1: 03626a8b85ed9f510c7435465d9753ebf1178013
2020-10-28 07:19   
(edited on: 2020-10-28 07:20)
The support of Unicode in different components of OCCT has been added over years and now seems reasonably complete.

The remaining known issues are with XML format where Unicode strings are supported, but in non-standard way (using UTF-16 encoded in own variant of hexadecimal / base16 code). These issues should be resolved later, with consideration of compatibility issues for OCAF and other documents to remain readable by applications based on previous versions of OCCT.

A few places where Unicode was not duly supported (in DRAW, message files, and TObj persistence as described in 0026749) are corrected in branch CR14673, please review. Jenkins tests are OK, see job CR14673-abv.

2020-10-28 08:41   
Branch CR14673 has been updated forcibly by abv.

SHA-1: 131f55f427d6c2e8d2dfd30c4274758d356da111
2020-10-31 12:48   
Combination -
OCCT branch : OCCT-750
master SHA - a8b9d7eb277d4ce8949427b7c6ab6af92422ae83
Products branch : OCCT-750 SHA - d1791aa18ab401708974b4c974aba57dc55acaa7
was compiled on Linux, MacOS and Windows platforms and tested in optimize mode.

Number of compiler warnings:
No new/fixed warnings

No regressions/differences

CPU differences:
Total CPU difference: 17977.46000000013 / 17994.53000000008 [-0.09%]
Total CPU difference: 12143.270000000102 / 12171.670000000115 [-0.23%]
Total CPU difference: 19728.796875 / 19728.265625 [+0.00%]
Total CPU difference: 13560.6875 / 13538.390625 [+0.16%]

Image differences :
No differences that require special attention

Memory differences :
No differences that require special attention
2020-11-05 15:59   
Branch CR14673 has been deleted by inv.

SHA-1: 131f55f427d6c2e8d2dfd30c4274758d356da111