View Issue Details

IDProjectCategoryView StatusLast Update
0014673Open CASCADEOCCT:Foundation Classespublic2020-12-02 17:11
Reporterabv Assigned Tobugmaster  
PrioritynormalSeverityfeature 
Status closedResolutionfixed 
OSAll 
Target Version7.5.0Fixed in Version7.5.0 
Summary0014673: Provide true support for Unicode symbols
DescriptionThis improvement is inspired by OCC14672: as it turns, regardless of the fact
that OCCT provides class TCollection_ExtendedString and uses it in many places
(e.g. OCAF), de-facto the possibility to store Unicode (or any other non-ascii)
symbols by means of that class is almost never used (at least, not in OCC).

That is obviously bad: if we provide a class capable of storing Unicode
strings, and use it in many places, the possibility to have non-ascii symbols
in it should be supported.

For moving in that direction, the following steps are proposed:

1. Provide methods to convert Unicode string in the form of
TCollection_ExtendedString to other encodings; at least most widespread Ascii-
based encodings like HTML and UTF-8 are necessary

2. Provide method to dump ExtendedString (interpreted as Unicode one) to DRAW
Tcl interpreter (which has complete support for encodings and uses internally
UTF-8)

3. Revise the code of OCCT where ExtendedString is converted to Ascii one (they
can be found either by revising modifications made in OCC14672 or directly by
searching OCCT code for "'?'" symbol used in safe conversions, or by
Is(An)Ascii() method), for a goal to provide more adequate conversion.

a) At least, the output to DRAW can use directly Unicode encoding.

b) Another good candidate for such improvement is XML persistence (LDOM* and
other dependent packages) -- see also OCC983 and OCC5032

c) As it seems, the visualisation already contains some code handling Unicode,
though in many cases (e.g. when computing size of text) it converts it to Ascii

d) Package Resource can also be considered. Note that it already contains some
code for converting Unicode strings to and from some far-eastern encodings
(EUC, GB and ShiftJIS) -- see Resource_Unicode class
TagsNo tags attached.
Test case numberbugs/demo/bug14673_1,bug14673_2,bug14673_3,bug14673_4

Attached Files

  • OSD_Path.cxx (41,193 bytes)

Relationships

related to 0031670 closedbugmaster Community Data Exchange - cp1251 Cyrillic characters in STEP file 
related to 0025685 feedbackVico Liang Community Application Framework - TCollection_ExtendedString unicode storage in xml document is unreadable 
parent of 0022484 closedbugmaster Open CASCADE UNICODE characters support. 
parent of 0022125 closedbugmaster Open CASCADE TCollection_ExtendedString: conversion from UTF-8 to unicode 
related to 0031171 closedabv Open CASCADE Draw - support Unicode input / output in console on Windows 
related to 0026749 closedabv Community TObj_Assistant::FindModel fails to find model with unicode name. 

Activities

2011-04-01 12:50

 

OSD_Path.cxx (41,193 bytes)

Vico Liang

2016-06-09 20:48

updater   ~0054868

There is a post "Better support XML Unicode storage in OCAF"

http://dev.opencascade.org/index.php?q=node/1157

git

2020-10-26 09:34

administrator   ~0096262

Branch CR14673 has been created by abv.

SHA-1: 44e765c1c346fa6472fc77639db8214b9b019ab8


Detailed log of new commits:

Author: abv
Date: Sun Oct 25 22:10:27 2020 +0300

    0014673: Provide true support for Unicode symbols
    
    Method converting LDOMBasicString to TCollection_ExtendedString is corrected to consider the original string to be in UTF-8 encoding.
    
    Construction of TCollection_ExtendedString from plain C string is fixed to consider input string as UTF-8 in multiple other places (identified as described in notes to 0031113).
    
    Added tests for use of Unicode in some DRAW commands (bugs demo bug14673_*)

git

2020-10-26 21:36

administrator   ~0096283

Branch CR14673 has been updated forcibly by abv.

SHA-1: 56721fa2df5bb0156b1b6403b807ac00a2858b87

git

2020-10-27 09:17

administrator   ~0096287

Branch CR14673 has been updated forcibly by abv.

SHA-1: 1394bd7d74d461b859af9f6b0e41b6042e3b1003

git

2020-10-27 20:39

administrator   ~0096316

Branch CR14673 has been updated forcibly by abv.

SHA-1: 03626a8b85ed9f510c7435465d9753ebf1178013

abv

2020-10-28 07:19

manager   ~0096319

Last edited: 2020-10-28 07:20

The support of Unicode in different components of OCCT has been added over years and now seems reasonably complete.

The remaining known issues are with XML format where Unicode strings are supported, but in non-standard way (using UTF-16 encoded in own variant of hexadecimal / base16 code). These issues should be resolved later, with consideration of compatibility issues for OCAF and other documents to remain readable by applications based on previous versions of OCCT.

A few places where Unicode was not duly supported (in DRAW, message files, and TObj persistence as described in 0026749) are corrected in branch CR14673, please review. Jenkins tests are OK, see job CR14673-abv.

git

2020-10-28 08:41

administrator   ~0096324

Branch CR14673 has been updated forcibly by abv.

SHA-1: 131f55f427d6c2e8d2dfd30c4274758d356da111

bugmaster

2020-10-31 12:48

administrator   ~0096422

Combination -
OCCT branch : OCCT-750
master SHA - a8b9d7eb277d4ce8949427b7c6ab6af92422ae83
a206de37fbfa0bf71bd534ae47192bbec23b8522
Products branch : OCCT-750 SHA - d1791aa18ab401708974b4c974aba57dc55acaa7
was compiled on Linux, MacOS and Windows platforms and tested in optimize mode.

Number of compiler warnings:
No new/fixed warnings

Regressions/Differences/Improvements:
No regressions/differences

CPU differences:
Debian80-64:
OCCT
Total CPU difference: 17977.46000000013 / 17994.53000000008 [-0.09%]
Products
Total CPU difference: 12143.270000000102 / 12171.670000000115 [-0.23%]
Windows-64-VC14:
OCCT
Total CPU difference: 19728.796875 / 19728.265625 [+0.00%]
Products
Total CPU difference: 13560.6875 / 13538.390625 [+0.16%]


Image differences :
No differences that require special attention

Memory differences :
No differences that require special attention

git

2020-11-05 15:59

administrator   ~0096551

Branch CR14673 has been deleted by inv.

SHA-1: 131f55f427d6c2e8d2dfd30c4274758d356da111

Related Changesets

occt: master 94f16a89

2020-10-25 19:10:27

abv


Committer: bugmaster Details Diff
0014673: Provide true support for Unicode symbols

Construction of TCollection_ExtendedString from plain C string is fixed to consider input string as UTF-8 in several places (identified as described in notes to 0031113).

Message_MsgFile is corrected to load resource file as UTF-8 (unless it has BOM indicating use of UTF-16).

Added tests for use of Unicode in some DRAW commands (bugs demo bug14673_*)
Affected Issues
0014673
mod - dox/upgrade/upgrade.md Diff File
mod - src/DDataStd/DDataStd_BasicCommands.cxx Diff File
mod - src/DDocStd/DDocStd_ApplicationCommands.cxx Diff File
mod - src/DDocStd/DDocStd_MTMCommands.cxx Diff File
mod - src/DNaming/DNaming_BasicCommands.cxx Diff File
mod - src/DNaming/DNaming_ModelingCommands.cxx Diff File
mod - src/Message/Message_MsgFile.cxx Diff File
mod - src/TObj/TObj_Assistant.cxx Diff File
mod - src/TObjDRAW/TObjDRAW.cxx Diff File
mod - src/ViewerTest/ViewerTest_ViewerCommands.cxx Diff File
mod - src/XDEDRAW/XDEDRAW.cxx Diff File
mod - src/XDEDRAW/XDEDRAW_Notes.cxx Diff File
mod - src/XSDRAWSTLVRML/XSDRAWSTLVRML.cxx Diff File
add - tests/bugs/demo/bug14673_1 Diff File
add - tests/bugs/demo/bug14673_2 Diff File
add - tests/bugs/demo/bug14673_3 Diff File
add - tests/bugs/demo/bug14673_4 Diff File

Issue History

Date Modified Username Field Change
2011-08-02 11:23 bugmaster Category OCCT:FDC => OCCT:Foundation Classes
2011-12-05 10:42 abv Fixed in Version EMPTY =>
2011-12-05 10:42 abv Target Version => 6.5.3
2011-12-05 10:42 abv Description Updated
2011-12-05 10:43 abv Relationship added parent of 0022484
2011-12-05 10:45 abv Relationship added parent of 0022125
2012-01-12 17:29 ysn Project Open CASCADE => Internal
2012-03-12 07:45 abv Target Version 6.5.3 => 6.5.4
2012-10-20 08:06 abv Status new => feedback
2012-10-20 08:11 abv Target Version 6.5.4 => Unscheduled
2012-10-26 10:56 bugmaster Project Internal => Open CASCADE
2012-10-30 09:29 abv Assigned To bugmaster => abv
2012-10-30 09:29 abv Status feedback => assigned
2014-10-01 10:04 abv Target Version Unscheduled => 7.1.0
2016-06-09 20:48 Vico Liang Note Added: 0054868
2016-11-09 11:17 abv Target Version 7.1.0 => 7.2.0
2017-07-27 09:43 abv Target Version 7.2.0 => 7.4.0
2019-07-10 22:29 abv Target Version 7.4.0 => 7.5.0
2019-11-16 15:37 abv Relationship added related to 0031171
2020-09-11 16:13 utverdov Target Version 7.5.0 => 7.6.0
2020-10-06 22:49 abv Relationship added related to 0031670
2020-10-14 12:25 abv Relationship added related to 0025685
2020-10-26 09:34 git Note Added: 0096262
2020-10-26 21:36 git Note Added: 0096283
2020-10-27 09:17 git Note Added: 0096287
2020-10-27 20:39 git Note Added: 0096316
2020-10-28 07:19 abv Note Added: 0096319
2020-10-28 07:19 abv Status assigned => resolved
2020-10-28 07:19 abv Target Version 7.6.0 => 7.5.0
2020-10-28 07:19 abv Assigned To abv => kgv
2020-10-28 07:20 abv Note Edited: 0096319
2020-10-28 07:22 abv Relationship added related to 0026749
2020-10-28 08:41 git Note Added: 0096324
2020-10-28 09:50 kgv Assigned To kgv => bugmaster
2020-10-28 09:50 kgv Status resolved => reviewed
2020-10-31 12:48 bugmaster Note Added: 0096422
2020-10-31 12:48 bugmaster Status reviewed => tested
2020-10-31 12:51 bugmaster Test case number => bugs/demo/bug14673_1,bug14673_2,bug14673_3,bug14673_4
2020-10-31 12:54 bugmaster Changeset attached => occt master 94f16a89
2020-10-31 12:54 bugmaster Status tested => verified
2020-10-31 12:54 bugmaster Resolution open => fixed
2020-11-05 15:59 git Note Added: 0096551
2020-12-02 16:22 emo Fixed in Version => 7.5.0
2020-12-02 17:11 emo Status verified => closed