0022484 2011-05-10 12:40 2019-03-02 22:55
[OCCT] 6.8.0[OCCT] 6.8.0 
Will be created
0022484: UNICODE characters support.
OpenCascade uses primarily ASCII encoding and doesn’t fully support Unicode
characters, especially that is concerned working with files and directories. It
makes real restriction on development of modern application that must work with
non-ASCII file names as well. So, Unicode encoding support should be added.
UNICODE characters support in OCCT

The main idea of the implemented improvement is that API is kept untouched. Instead behavior of all functions that accept on input Standard_CString is changed, so that now the strings are assumed to be in UTF-8 encoding.

The constructor of TCollection_ExtendedString is used to convert UTF-8 strings to wide characters, which are then cast directly to wchar_t* and passed to appropriate system functions that take wchar_t* on input; for example, _wopen instead of open.

Note that this change will break backward compatibility with applications which currently use filenames in extended ASCII encoding bound to the current locale. Such applications should be updated to convert such strings to UTF-8 format.

The patch has been implemented for WNT platform only; other platforms remain supporting only ASCII encoding.

The conversion from UTF-8 to wchar_t is made using little-endian approach. Thus, this code will not work correctly on big-endian platforms. It is needed to complete this in the way similar as it is done for binary persistence (see the macro DO_INVERSE in FSD_FileHeader.hxx).
2013-06-10 11:33   
Current implementation uses char* to store UTF-8 string. The API of OCC file classes and methods was not modified. The string is analyzed and in case of UFT-8 code converted to be used in appropriate system functions.

Current implementation finalized only for Windows platform.
2013-06-11 12:19   
The first version of fix put into CR22484.

Please review and send back
2013-06-13 10:01   
Dear pdn,

I have following remarks to the patch (after preliminary review).
> #ifdef WNT
WNT macro definition is deprecated. Please use appropriate build-in macros instead (_MSC_VER or _WIN32 depending on situation).

> (const wchar_t*) aWName.ToExtString(),ios::in); // ios::nocreate is not portable
Please remove irrelevant comments from copy-pasted lines.

> + TCollection_ExtendedString aFileNameW(aFileName, Standard_True);
> + TCollection_ExtendedString dirNameW(dirName);
If I understood correctly, TCollection_ExtendedString constructor parses

> +/* LD : We do not need this routine any longer : */
> +/* Dont remove a no empty directory */
> +
> +
> +#if 0
Big unidentified piece of commented code.

General remark - this patch broke backward compatibility with applications which currently uses filenames in current locale.
2013-06-13 12:53   
1) It is needed also to make corrections in OCCT Products, like DXF.
2) TCollection_ExtendedString. In ConvertToUnicode2B and ConvertToUnicode3B, the code is stick to little endian. It will be inconsistent for big endian platforms.
3) The following files contain changes not relevant to this bug. It is needed to create another bug for that, and make a version here that contains only relevant changes.
4) Why did you add commented routine DeleteDirectory in OSD_WNT.cxx appeared again? Please, remove.
5) The same is with the function SetDeleteDirectoryProc.
6) Remove declaration of the following functions in OSD_WNT_1.hxx:
7) The changes concerning "locale" in PCDM_RetrievalDriver.cxx are incorrect (return back to an older version).

Concerning general remark of KGV, I agree. We need to clearly declare it in release notes.
2014-09-23 14:05   
Branch CR22484 has been updated forcibly by pdn.

SHA-1: a5aaaf3a2c80ef36b4d33c9e070213fd118d058b
2014-09-24 12:21   
Minor remarks:
- in Draw_Interpretor.cxx, local class TclUTFToLocalStringSentry becomes unused, please remove it
- change in PCDM.cdl is wrong, please revert (these two enums have been removed recently, see 0024180)
2014-09-24 12:28   
I have added description of the patch prepared by msv some time ago (edited) in Additional Information field
2014-09-24 12:39   
Branch CR22484 has been updated forcibly by pdn.

SHA-1: 5a4311328b5af737e0402999152f948a902ad8ce
2014-09-24 14:35   
No remarks, please check building on Linux, Windows 64-bit, then test
2014-09-25 12:38   
Dear BugMaster,

Branch CR22484 was compiled on Linux and following compilation errors were detected: [^]

1. ../../../../src/FSD/FSD_BinaryFile.cxx:84:8: error: macro names must be identifiers

2. ../../../../src/FSD/FSD_File.cxx:76:8: error: macro names must be identifiers

Branch CR22484 was compiled on Windows and following compilation errors were detected: [^]

1. ..\..\..\src\FSD\FSD_BinaryFile.cxx(84): fatal error C1016: #if[n]def expected an identifier

2. ..\..\..\src\FSD\FSD_File.cxx(76): fatal error C1016: #if[n]def expected an identifier

3. ..\..\..\src\Message\Message_MsgFile.cxx(217): fatal error C1016: #if[n]def expected an identifier

4. ..\..\..\src\LDOM\LDOMParser.cxx(138): fatal error C1016: #if[n]def expected an identifier

5. ..\..\..\src\BinLDrivers\BinLDrivers_DocumentRetrievalDriver.cxx(182): fatal error C1016: #if[n]def expected an identifier

6. ..\..\..\src\StepFile\stepread.c(86): fatal error C1016: #if[n]def expected an identifier
2014-09-25 14:29   
Branch CR22484 has been updated by pdn.

SHA-1: e3249fb86154833d355b08ea31f2e0ff9d1c5e30

Detailed log of new commits:

Author: pdn
Date: Thu Sep 25 14:29:15 2014 +0400

    Fix for compilation errors and fix for StepFile (avoid objects in pure c code)

2014-09-25 14:29   
2014-09-26 13:57   
Dear BugMaster,

Branch CR22484 (and products from GIT master) was compiled on Linux, MacOS and Windows platforms and tested.
SHA-1: e3249fb86154833d355b08ea31f2e0ff9d1c5e30

Number of compiler warnings:
occt component:
   Linux: 15 (15 on master)
   Windows: 0 (0 on master)
   MacOS: 193 (193 on master)
products component :
   Linux: 11 (11 on master)
   Windows: 1 (1 on master)

http://occt-tests/CR22484-master-occt/Windows-32-VC10/summary.html [^]
bugs caf(015) bug170_3

Testing cases:
Not done

Testing on Linux:
Total MEMORY difference: 356246768 / 355468788
Total CPU difference: 45192.93000000016 / 44818.88000000009

Testing on Windows:
Total MEMORY difference: 245490128 / 242252872
Total CPU difference: 34173.109375 / 34303.3125

There are differences in images found by testdiff:
http://occt-tests/CR22484-master-occt/Debian60-64/diff-Debian60-64.html [^]
http://occt-tests/CR22484-master-occt/Windows-32-VC10/diff-Windows-32-VC10.html [^]
Pay attention to: bugs vis bug22796_2

2014-09-30 13:23   
Branch CR22484 has been updated by pdn.

SHA-1: 57a388503997f41b94212f37673af17d427bdf83

Detailed log of new commits:

Author: pdn
Date: Tue Sep 30 13:23:30 2014 +0400

    Fixes for set unicode symbols to OCAF and visualization

2014-09-30 13:24   
Fixed, please review and retest
2014-10-01 09:59   
No remarks, please test
2014-10-01 16:29   
Branch CR22484 has been updated forcibly by apv.

SHA-1: c59407add5ab4f2225ad5fe740997438e9a7c628
2014-10-02 14:07   
Dear BugMaster,

Branch CR22484 (and products from GIT master) was compiled on Linux, MacOS and Windows platforms and tested.
SHA-1: c59407add5ab4f2225ad5fe740997438e9a7c628

Number of compiler warnings:
occt component:
   Linux: 15 (15 on master)
   Windows: 0 (0 on master)
   MacOS: 196 (196 on master)
products component :
   Linux: 11 (11 on master)
   Windows: 3 (3 on master)

Not detected

Testing cases:
Will be created

Testing on Linux:
Total MEMORY difference: 398316920 / 397657468
Total CPU difference: 47295.91000000001 / 46596.08000000006

Testing on Windows:
Total MEMORY difference: 279461580 / 279221480
Total CPU difference: 38884.5 / 39387.71875

There are differences in images found by testdiff:
http://occt-tests/CR22484-master-occt/Debian60-64/diff-Debian60-64.html [^]
http://occt-tests/CR22484-master-occt/Windows-32-VC10/diff-Windows-32-VC10.html [^]
Pay attention to: bugs vis bug22149
2014-10-02 14:16   
Branch CR22484 has been updated by abv.

SHA-1: a27569a567728ab3f99677ecb9a1656d02a22d5c

Detailed log of new commits:

Author: abv
Date: Thu Oct 2 14:15:14 2014 +0400

    Definition of Unicode symbol in test corrected

2014-10-02 14:18   
I have corrected test bugs vis bug22149, please check the image
2014-10-02 14:50   
Dear BugMaster,

Updated test-case bugs viz bug22149 has been relaunched on Linux and Windows platforms. Results on both platforms are OK.

http://occt-tests/CR22484-master-occt/Debian60-64/bugs/vis/bug22149.html [^]
http://occt-tests/CR22484-master-occt/Windows-32-VC10/bugs/vis/bug22149.html [^]
2014-10-13 23:15   
I have problems to understand where I need to change the string handling in my program. For example BRepTools::Write still call with an Standandard_CString instead of a wide string.
2014-10-13 23:25   
It seems the solution currently implemented covers only STEP, OCCT binary persistence, and message resources. Reading and writing BREP and IGES seems to still not support Unicode on Windows. Pavel, can you confirm this?
2014-10-14 11:52   
Yes, you are right. The additional bug will be created to update IGES and BRep
2014-10-21 16:43   
Branch CR22484 has been deleted by inv.

SHA-1: a27569a567728ab3f99677ecb9a1656d02a22d5c
2014-11-06 11:59   
Please use the following lines to make a test:

box b 10 10 10
set s [encoding convertfrom unicode \x1F\x04\x40\x04\x35\x04\x32\x04\x35\x04\x34\x04]
stepwrite a b $s.stp
# file should be created
stepread $s.stp a *