MantisBT - Open CASCADE
View Issue Details
0022484Open CASCADE[OCCT] OCCT:Foundation Classespublic2011-05-10 12:402019-03-02 22:55
epv 
bugmaster 
normalfeature 
closedfixed 
All
 
[OCCT] 6.8.0[OCCT] 6.8.0 
Will be created
0022484: UNICODE characters support.
OpenCascade uses primarily ASCII encoding and doesn’t fully support Unicode
characters, especially that is concerned working with files and directories. It
makes real restriction on development of modern application that must work with
non-ASCII file names as well. So, Unicode encoding support should be added.
UNICODE characters support in OCCT

The main idea of the implemented improvement is that API is kept untouched. Instead behavior of all functions that accept on input Standard_CString is changed, so that now the strings are assumed to be in UTF-8 encoding.

The constructor of TCollection_ExtendedString is used to convert UTF-8 strings to wide characters, which are then cast directly to wchar_t* and passed to appropriate system functions that take wchar_t* on input; for example, _wopen instead of open.

Note that this change will break backward compatibility with applications which currently use filenames in extended ASCII encoding bound to the current locale. Such applications should be updated to convert such strings to UTF-8 format.

The patch has been implemented for WNT platform only; other platforms remain supporting only ASCII encoding.

The conversion from UTF-8 to wchar_t is made using little-endian approach. Thus, this code will not work correctly on big-endian platforms. It is needed to complete this in the way similar as it is done for binary persistence (see the macro DO_INVERSE in FSD_FileHeader.hxx).
No tags attached.
related to 0024716closed bugmaster Open CASCADE OSD_Path - remove excessive validity checks and allow non-ascii strings 
related to 0025302closed abv Open CASCADE Incorrect locale and unicode support in Draw console 
parent of 0025369closed bugmaster Open CASCADE Visualization, Image_AlienPixMap - handle UTF-8 names in image read/save operations on Windows 
parent of 0025367closed bugmaster Open CASCADE IGES and BRep persistence - support unicode file names on Windows 
parent of 0026380closed bugmaster Open CASCADE OSD_SharedLibrary - handle UTF-8 file paths 
parent of 0027198closed abv Open CASCADE OSD_Environment - use wide characters API on Windows 
parent of 0027675closed bugmaster Open CASCADE Foundation Classes - handle Unicode path to CSF_UnitsLexicon and CSF_UnitsDefinition on Windows 
parent of 0027676closed bugmaster Open CASCADE Foundation Classes - define Standard_ExtCharacter, Standard_Utf16Char using C++11 types char16_t 
parent of 0027838closed kgv Open CASCADE Foundation Classes - support wchar_t* input within TCollection_AsciiString and TCollection_ExtendedString 
parent of 0027880closed bugmaster Open CASCADE Samples - fix handling of Unicode paths within MFC import/export sample 
parent of 0025534closed bugmaster Community TObj_Application unicode path issue. 
parent of 0028110closed apn Open CASCADE Configuration - specify Unicode charset instead of multibyte in project files for Visual Studio 
parent of 0028353closed apn Community Samples - IESample cannot write files to paths with special characters 
parent of 0028454assigned gka Community Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files 
parent of 0029069closed apn Community Samples - handle UNICODE filenames within C++/CLI CSharp sample 
related to 0022125closed bugmaster Open CASCADE TCollection_ExtendedString: conversion from UTF-8 to unicode 
related to 0024943closed bugmaster Open CASCADE Port MFC sample to UNICODE for compatibility with VS2013 
related to 0025308new pdn Open CASCADE TCollection_ExtendedString, NCollection_String - merge classes for string management 
related to 0026514closed bugmaster Open CASCADE OSD_Path can not work with French symbols in file name. 
related to 0028172closed kgv Community Replace Standard_CString file path with Unicode form TCollection_ExtendedString 
child of 0014673assigned abv Open CASCADE Provide true support for Unicode symbols 
Not all the children of this issue are yet resolved or closed.
zip OCC22484_v1_epv.zip (204,990) 2011-05-10 10:43
https://tracker.dev.opencascade.org/
Issue History
2011-08-02 11:23bugmasterCategoryOCCT:FDC => OCCT:Foundation Classes
2011-08-03 12:17bugmasterFixed in VersionEMPTY =>
2011-08-03 12:17bugmasterTarget Version => 6.5.2
2011-08-03 12:17bugmasterDescription Updatedbug_revision_view_page.php?rev_id=234#r234
2011-09-21 12:42bugmasterTarget Version6.5.2 => 6.5.3
2011-12-05 10:43abvRelationship addedchild of 0014673
2012-02-02 10:16abvTarget Version6.5.3 => 6.5.4
2012-10-21 11:29abvTarget Version6.5.4 => 6.6.0
2013-02-28 17:05abvTarget Version6.6.0 => 6.7.0
2013-06-07 20:25pdnAssigned Tobugmaster => pdn
2013-06-07 20:25pdnStatusnew => assigned
2013-06-09 20:54sanRelationship addedrelated to 0023457
2013-06-09 20:59sanRelationship deletedrelated to 0023457
2013-06-10 11:33pdnNote Added: 0024708
2013-06-11 12:19pdnNote Added: 0024728
2013-06-11 12:19pdnAssigned Topdn => msv
2013-06-11 12:19pdnStatusassigned => resolved
2013-06-13 10:01kgvNote Added: 0024739
2013-06-13 12:53msvNote Added: 0024744
2013-06-13 12:53msvAssigned Tomsv => pdn
2013-06-13 12:53msvStatusresolved => assigned
2013-11-06 15:10kgvRelationship addedrelated to 0022125
2013-11-06 15:12kgvTarget Version6.7.0 => 6.7.1
2014-02-13 11:08kgvRelationship addedrelated to 0024622
2014-04-04 18:16abvTarget Version6.7.1 => 6.8.0
2014-07-12 17:37kgvRelationship addedrelated to 0024943
2014-09-10 08:19kgvTarget Version6.8.0 => 7.0.0
2014-09-23 14:05gitNote Added: 0032000
2014-09-23 14:06pdnTarget Version7.0.0 => 6.8.0
2014-09-23 14:06pdnAssigned Topdn => abv
2014-09-23 14:06pdnStatusassigned => resolved
2014-09-24 12:21abvNote Added: 0032076
2014-09-24 12:21abvAssigned Toabv => pdn
2014-09-24 12:21abvStatusresolved => assigned
2014-09-24 12:28abvNote Added: 0032078
2014-09-24 12:28abvAdditional Information Updatedbug_revision_view_page.php?rev_id=8106#r8106
2014-09-24 12:39gitNote Added: 0032079
2014-09-24 12:40pdnAssigned Topdn => abv
2014-09-24 12:40pdnStatusassigned => resolved
2014-09-24 14:35abvNote Added: 0032090
2014-09-24 14:35abvAssigned Toabv => bugmaster
2014-09-24 14:35abvStatusresolved => reviewed
2014-09-24 14:49apvAssigned Tobugmaster => apv
2014-09-24 15:01kgvRelationship addedrelated to 0024716
2014-09-25 12:38apvNote Added: 0032129
2014-09-25 12:39apvAssigned Toapv => pdn
2014-09-25 12:39apvStatusreviewed => assigned
2014-09-25 14:29gitNote Added: 0032144
2014-09-25 14:29pdnNote Added: 0032145
2014-09-25 14:29pdnAssigned Topdn => abv
2014-09-25 14:29pdnStatusassigned => resolved
2014-09-25 14:30pdnStatusresolved => reviewed
2014-09-25 14:31apvAssigned Toabv => apv
2014-09-26 13:57apvNote Added: 0032224
2014-09-26 13:57apvAssigned Toapv => pdn
2014-09-26 13:57apvStatusreviewed => assigned
2014-09-26 13:57apvNote Edited: 0032224bug_revision_view_page.php?bugnote_id=32224#r8133
2014-09-30 13:23gitNote Added: 0032454
2014-09-30 13:24pdnNote Added: 0032455
2014-09-30 13:24pdnAssigned Topdn => abv
2014-09-30 13:24pdnStatusassigned => resolved
2014-10-01 09:59abvNote Added: 0032492
2014-10-01 09:59abvAssigned Toabv => bugmaster
2014-10-01 09:59abvStatusresolved => reviewed
2014-10-01 11:28pdnRelationship addedrelated to 0025302
2014-10-01 13:07apvAssigned Tobugmaster => apv
2014-10-01 13:12kgvRelationship addedrelated to 0025308
2014-10-01 16:29gitNote Added: 0032516
2014-10-02 14:04apvTest case number => Will be created
2014-10-02 14:07apvNote Added: 0032573
2014-10-02 14:07apvAssigned Toapv => pdn
2014-10-02 14:07apvStatusreviewed => assigned
2014-10-02 14:16gitNote Added: 0032578
2014-10-02 14:18abvNote Added: 0032579
2014-10-02 14:18abvAssigned Topdn => apv
2014-10-02 14:18abvStatusassigned => feedback
2014-10-02 14:50apvNote Added: 0032581
2014-10-02 14:50apvAssigned Toapv => bugmaster
2014-10-02 14:50apvStatusfeedback => tested
2014-10-03 14:07bugmasterChangeset attached => occt master d9ff84e8
2014-10-03 14:07bugmasterStatustested => verified
2014-10-03 14:07bugmasterResolutionopen => fixed
2014-10-13 23:15shoogenNote Added: 0033047
2014-10-13 23:25abvNote Added: 0033048
2014-10-14 11:52pdnNote Added: 0033058
2014-10-14 16:44kgvRelationship addedparent of 0025369
2014-10-14 16:49kgvRelationship addedparent of 0025367
2014-10-21 16:43gitNote Added: 0033446
2014-11-06 11:59pdnNote Added: 0034088
2014-11-11 12:42aivFixed in Version => 6.8.0
2014-11-11 13:03aivStatusverified => closed
2015-06-29 20:14kgvRelationship addedparent of 0026380
2015-08-03 20:55kgvRelationship addedrelated to 0026514
2016-02-22 22:09kgvRelationship addedparent of 0027198
2016-07-13 22:44kgvRelationship addedparent of 0027675
2016-07-14 10:44kgvRelationship addedparent of 0027676
2016-09-04 14:34kgvRelationship addedparent of 0027838
2016-09-19 11:48kgvRelationship addedparent of 0027880
2016-11-02 12:57kgvRelationship addedchild of 0025534
2016-11-02 12:58kgvRelationship replacedparent of 0025534
2016-11-02 12:58kgvRelationship addedparent of 0028040
2016-11-16 10:27kgvRelationship addedparent of 0028110
2016-11-29 07:55kgvRelationship addedrelated to 0028172
2017-01-13 16:10kgvRelationship addedparent of 0028353
2017-02-13 17:15kgvRelationship addedparent of 0028454
2018-01-29 11:18kgvRelationship addedparent of 0029069
2019-03-02 22:55kgvRelationship addedparent of 0030529

Notes
(0024708)
pdn   
2013-06-10 11:33   
Current implementation uses char* to store UTF-8 string. The API of OCC file classes and methods was not modified. The string is analyzed and in case of UFT-8 code converted to be used in appropriate system functions.

Current implementation finalized only for Windows platform.
(0024728)
pdn   
2013-06-11 12:19   
The first version of fix put into CR22484.

Please review and send back
(0024739)
kgv   
2013-06-13 10:01   
Dear pdn,

I have following remarks to the patch (after preliminary review).
> #ifdef WNT
WNT macro definition is deprecated. Please use appropriate build-in macros instead (_MSC_VER or _WIN32 depending on situation).

> myStream.open( (const wchar_t*) aWName.ToExtString(),ios::in); // ios::nocreate is not portable
Please remove irrelevant comments from copy-pasted lines.

> + TCollection_ExtendedString aFileNameW(aFileName, Standard_True);
> + TCollection_ExtendedString dirNameW(dirName);
If I understood correctly, TCollection_ExtendedString constructor parses

> +/* LD : We do not need this routine any longer : */
> +/* Dont remove a no empty directory */
> +
> +
> +#if 0
Big unidentified piece of commented code.

General remark - this patch broke backward compatibility with applications which currently uses filenames in current locale.
(0024744)
msv   
2013-06-13 12:53   
Remarks:
1) It is needed also to make corrections in OCCT Products, like DXF.
2) TCollection_ExtendedString. In ConvertToUnicode2B and ConvertToUnicode3B, the code is stick to little endian. It will be inconsistent for big endian platforms.
3) The following files contain changes not relevant to this bug. It is needed to create another bug for that, and make a version here that contains only relevant changes.
BinLDrivers_DocumentRetrievalDriver.cdl
BinLDrivers_DocumentRetrievalDriver.cxx
4) Why did you add commented routine DeleteDirectory in OSD_WNT.cxx appeared again? Please, remove.
5) The same is with the function SetDeleteDirectoryProc.
6) Remove declaration of the following functions in OSD_WNT_1.hxx:
DeleteDirectory
SetDeleteDirectoryProc
DirWalk
MsgBox
WNT_InitTimer
WNT_StatTimer
_debug_break
7) The changes concerning "locale" in PCDM_RetrievalDriver.cxx are incorrect (return back to an older version).

Concerning general remark of KGV, I agree. We need to clearly declare it in release notes.
(0032000)
git   
2014-09-23 14:05   
Branch CR22484 has been updated forcibly by pdn.

SHA-1: a5aaaf3a2c80ef36b4d33c9e070213fd118d058b
(0032076)
abv   
2014-09-24 12:21   
Minor remarks:
- in Draw_Interpretor.cxx, local class TclUTFToLocalStringSentry becomes unused, please remove it
- change in PCDM.cdl is wrong, please revert (these two enums have been removed recently, see 0024180)
(0032078)
abv   
2014-09-24 12:28   
I have added description of the patch prepared by msv some time ago (edited) in Additional Information field
(0032079)
git   
2014-09-24 12:39   
Branch CR22484 has been updated forcibly by pdn.

SHA-1: 5a4311328b5af737e0402999152f948a902ad8ce
(0032090)
abv   
2014-09-24 14:35   
No remarks, please check building on Linux, Windows 64-bit, then test
(0032129)
apv   
2014-09-25 12:38   
Dear BugMaster,

Branch CR22484 was compiled on Linux and following compilation errors were detected:
http://jenkins-test-02.nnov.opencascade.com/user/mnt/my-views/view/CR22484/job/mnt-CR22484-master_build_occt_linux/1/parsed_console/ [^]

1. ../../../../src/FSD/FSD_BinaryFile.cxx:84:8: error: macro names must be identifiers

2. ../../../../src/FSD/FSD_File.cxx:76:8: error: macro names must be identifiers

Branch CR22484 was compiled on Windows and following compilation errors were detected:
http://jenkins-test-02.nnov.opencascade.com/user/mnt/my-views/view/CR22484/job/mnt-CR22484-master_build_occt_windows/1/parsed_console/ [^]

1. ..\..\..\src\FSD\FSD_BinaryFile.cxx(84): fatal error C1016: #if[n]def expected an identifier

2. ..\..\..\src\FSD\FSD_File.cxx(76): fatal error C1016: #if[n]def expected an identifier

3. ..\..\..\src\Message\Message_MsgFile.cxx(217): fatal error C1016: #if[n]def expected an identifier

4. ..\..\..\src\LDOM\LDOMParser.cxx(138): fatal error C1016: #if[n]def expected an identifier

5. ..\..\..\src\BinLDrivers\BinLDrivers_DocumentRetrievalDriver.cxx(182): fatal error C1016: #if[n]def expected an identifier

6. ..\..\..\src\StepFile\stepread.c(86): fatal error C1016: #if[n]def expected an identifier
(0032144)
git   
2014-09-25 14:29   
Branch CR22484 has been updated by pdn.

SHA-1: e3249fb86154833d355b08ea31f2e0ff9d1c5e30


Detailed log of new commits:

Author: pdn
Date: Thu Sep 25 14:29:15 2014 +0400

    Fix for compilation errors and fix for StepFile (avoid objects in pure c code)

(0032145)
pdn   
2014-09-25 14:29   
Fixed
(0032224)
apv   
2014-09-26 13:57   
Dear BugMaster,

Branch CR22484 (and products from GIT master) was compiled on Linux, MacOS and Windows platforms and tested.
SHA-1: e3249fb86154833d355b08ea31f2e0ff9d1c5e30

Number of compiler warnings:
occt component:
   Linux: 15 (15 on master)
   Windows: 0 (0 on master)
   MacOS: 193 (193 on master)
products component :
   Linux: 11 (11 on master)
   Windows: 1 (1 on master)

Regressions/Differences:
http://occt-tests/CR22484-master-occt/Windows-32-VC10/summary.html [^]
bugs caf(015) bug170_3

Testing cases:
Not done

Testing on Linux:
Total MEMORY difference: 356246768 / 355468788
Total CPU difference: 45192.93000000016 / 44818.88000000009

Testing on Windows:
Total MEMORY difference: 245490128 / 242252872
Total CPU difference: 34173.109375 / 34303.3125

There are differences in images found by testdiff:
http://occt-tests/CR22484-master-occt/Debian60-64/diff-Debian60-64.html [^]
http://occt-tests/CR22484-master-occt/Windows-32-VC10/diff-Windows-32-VC10.html [^]
Pay attention to: bugs vis bug22796_2

(0032454)
git   
2014-09-30 13:23   
Branch CR22484 has been updated by pdn.

SHA-1: 57a388503997f41b94212f37673af17d427bdf83


Detailed log of new commits:

Author: pdn
Date: Tue Sep 30 13:23:30 2014 +0400

    Fixes for set unicode symbols to OCAF and visualization

(0032455)
pdn   
2014-09-30 13:24   
Fixed, please review and retest
(0032492)
abv   
2014-10-01 09:59   
No remarks, please test
(0032516)
git   
2014-10-01 16:29   
Branch CR22484 has been updated forcibly by apv.

SHA-1: c59407add5ab4f2225ad5fe740997438e9a7c628
(0032573)
apv   
2014-10-02 14:07   
Dear BugMaster,

Branch CR22484 (and products from GIT master) was compiled on Linux, MacOS and Windows platforms and tested.
SHA-1: c59407add5ab4f2225ad5fe740997438e9a7c628

Number of compiler warnings:
occt component:
   Linux: 15 (15 on master)
   Windows: 0 (0 on master)
   MacOS: 196 (196 on master)
products component :
   Linux: 11 (11 on master)
   Windows: 3 (3 on master)

Regressions/Differences:
Not detected

Testing cases:
Will be created

Testing on Linux:
Total MEMORY difference: 398316920 / 397657468
Total CPU difference: 47295.91000000001 / 46596.08000000006

Testing on Windows:
Total MEMORY difference: 279461580 / 279221480
Total CPU difference: 38884.5 / 39387.71875

There are differences in images found by testdiff:
http://occt-tests/CR22484-master-occt/Debian60-64/diff-Debian60-64.html [^]
http://occt-tests/CR22484-master-occt/Windows-32-VC10/diff-Windows-32-VC10.html [^]
Pay attention to: bugs vis bug22149
(0032578)
git   
2014-10-02 14:16   
Branch CR22484 has been updated by abv.

SHA-1: a27569a567728ab3f99677ecb9a1656d02a22d5c


Detailed log of new commits:

Author: abv
Date: Thu Oct 2 14:15:14 2014 +0400

    Definition of Unicode symbol in test corrected

(0032579)
abv   
2014-10-02 14:18   
I have corrected test bugs vis bug22149, please check the image
(0032581)
apv   
2014-10-02 14:50   
Dear BugMaster,

Updated test-case bugs viz bug22149 has been relaunched on Linux and Windows platforms. Results on both platforms are OK.

http://occt-tests/CR22484-master-occt/Debian60-64/bugs/vis/bug22149.html [^]
http://occt-tests/CR22484-master-occt/Windows-32-VC10/bugs/vis/bug22149.html [^]
(0033047)
shoogen   
2014-10-13 23:15   
I have problems to understand where I need to change the string handling in my program. For example BRepTools::Write still call ofstream.open with an Standandard_CString instead of a wide string.
(0033048)
abv   
2014-10-13 23:25   
It seems the solution currently implemented covers only STEP, OCCT binary persistence, and message resources. Reading and writing BREP and IGES seems to still not support Unicode on Windows. Pavel, can you confirm this?
(0033058)
pdn   
2014-10-14 11:52   
Yes, you are right. The additional bug will be created to update IGES and BRep
(0033446)
git   
2014-10-21 16:43   
Branch CR22484 has been deleted by inv.

SHA-1: a27569a567728ab3f99677ecb9a1656d02a22d5c
(0034088)
pdn   
2014-11-06 11:59   
Please use the following lines to make a test:

box b 10 10 10
set s [encoding convertfrom unicode \x1F\x04\x40\x04\x35\x04\x32\x04\x35\x04\x34\x04]
stepwrite a b $s.stp
# file should be created
stepread $s.stp a *