MantisBT - Community
View Issue Details
0029081Community[OCCT] OCCT:Foundation Classespublic2017-09-04 17:302019-07-17 12:38
BenjaminBihler 
abv 
normalminor 
closedfixed 
MinGW-w64Windows7
[OCCT] 7.2.0 
[OCCT] 7.3.0[OCCT] 7.3.0 
bugs fclasses bug22125, bugs fclasses bug25367_igs
0029081: Foundation Classes, OSD_OpenStream - handle UNICODE file paths specifically in case of Mingw-w64
Mingw-w64 does not provide the non-standard Microsoft extension to open ifstreams and ofstreams using wchar_t* file names.

To be able to open the respective streams without the Microsoft extension one would have to subclass std::ifstream and std::ofstream and add an open method to the child classes that accepts wchar_t* file names.

This is exactly what has been done by the boost Filesystem library. If boost was added as an (optional) dependency to Open CASCADE, unicode paths would work also with Mingw-w64.
Not required
No tags attached.
related to 0027585closed apn Community It is not possible to store OCAF documents to paths with special characters in their names 
related to 0022125closed bugmaster Open CASCADE TCollection_ExtendedString: conversion from UTF-8 to unicode 
related to 0025367closed bugmaster Open CASCADE IGES and BRep persistence - support unicode file names on Windows 
parent of 0030403closed bugmaster Community  Foundation Classes - Overwriting Big "BinOcaf" Files Does Not Reduce Their Size 
Issue History
2017-09-04 17:30BenjaminBihlerNew Issue
2017-09-04 17:30BenjaminBihlerAssigned To => ziaulazam
2017-09-04 17:30BenjaminBihlerRelationship addedrelated to 0027585
2017-09-06 11:20gitNote Added: 0070263
2017-09-06 17:10gitNote Added: 0070281
2017-09-06 17:15gitNote Added: 0070282
2017-09-06 17:20gitNote Added: 0070283
2017-09-06 17:21BenjaminBihlerAssigned Toziaulazam => mpv
2017-09-06 17:21BenjaminBihlerStatusnew => resolved
2017-09-06 17:21BenjaminBihlerSteps to Reproduce Updatedbug_revision_view_page.php?rev_id=17457#r17457
2017-09-07 11:44mpvAssigned Tompv => abv
2017-10-01 19:21abvTest case number => bugs fclasses bug22125
2017-10-01 19:32abvTest case numberbugs fclasses bug22125 => bugs fclasses bug22125, bugs fclasses bug25367_igs
2017-10-01 19:33abvRelationship addedrelated to 0022125
2017-10-01 19:33abvRelationship addedrelated to 0025367
2017-10-04 22:06abvNote Added: 0071212
2017-10-04 22:07abvAssigned Toabv => BenjaminBihler
2017-10-04 22:07abvStatusresolved => assigned
2017-10-04 22:07abvStatusassigned => feedback
2017-10-04 22:11gitNote Added: 0071214
2017-10-04 22:16abvNote Added: 0071216
2017-10-05 06:41abvNote Edited: 0071212bug_revision_view_page.php?bugnote_id=71212#r17827
2017-10-05 12:54BenjaminBihlerNote Added: 0071230
2017-10-05 12:55BenjaminBihlerAssigned ToBenjaminBihler => abv
2017-10-05 13:04kgvNote Added: 0071231
2017-10-05 13:06kgvNote Edited: 0071231bug_revision_view_page.php?bugnote_id=71231#r17829
2017-10-05 13:46kgvAssigned Toabv => BenjaminBihler
2017-10-05 15:16BenjaminBihlerNote Added: 0071243
2017-10-05 15:16BenjaminBihlerAssigned ToBenjaminBihler => abv
2017-10-05 18:27gitNote Added: 0071246
2017-10-05 20:03gitNote Added: 0071247
2017-10-05 20:04kgvNote Added: 0071248
2017-10-05 20:04kgvAssigned Toabv => BenjaminBihler
2017-10-05 21:02abvNote Added: 0071249
2017-10-06 08:12gitNote Added: 0071250
2017-10-06 08:25kgvCategoryOCCT:Application Framework => OCCT:Foundation Classes
2017-10-06 11:13BenjaminBihlerNote Added: 0071253
2017-10-06 11:14BenjaminBihlerAssigned ToBenjaminBihler => abv
2017-10-06 11:14BenjaminBihlerStatusfeedback => resolved
2017-10-06 17:38kgvSummaryWith Mingw-w64 Unicode Paths Do Not Work => Foundation Classes, OSD_OpenStream - handle UNICODE file paths specifically in case of Mingw-w64
2017-10-06 19:58gitNote Added: 0071274
2017-10-06 20:49gitNote Added: 0071275
2017-10-07 07:45abvNote Added: 0071276
2017-10-07 07:45abvAssigned Toabv => bugmaster
2017-10-07 07:45abvStatusresolved => reviewed
2017-10-09 10:25bugmasterNote Added: 0071291
2017-10-09 10:25bugmasterStatusreviewed => tested
2017-10-12 19:00abvChangeset attached => occt master fc8918ad
2017-10-12 19:00abvAssigned Tobugmaster => abv
2017-10-12 19:00abvStatustested => verified
2017-10-12 19:00abvResolutionopen => fixed
2017-10-14 12:19gitNote Added: 0071449
2017-10-14 12:19gitNote Added: 0071450
2017-10-14 12:19gitNote Added: 0071451
2017-10-14 12:19gitNote Added: 0071452
2017-10-14 12:19gitNote Added: 0071453
2018-02-20 12:59aivTarget Version7.4.0 => 7.3.0
2018-06-29 21:15aivFixed in Version => 7.3.0
2018-06-29 21:19aivStatusverified => closed
2019-07-17 11:32BenjaminBihlerRelationship addedrelated to 0030403
2019-07-17 12:38kgvRelationship replacedparent of 0030403

Notes
(0070263)
git   
2017-09-06 11:20   
Branch CR29081 has been created by Zia ul Azam.

SHA-1: 989ba7aa828920acf73cabc4981193d2037bbcbf


Detailed log of new commits:

Author: Zia ul Azam
Date: Wed Sep 6 10:19:37 2017 +0200

    0029081: With Mingw-w64 Unicode Paths Do Not Work
    
    Boost file streams are used on Windows if boost is present to enable
    handling unicode paths also with Mingw-w64.

Author: Zia ul Azam
Date: Wed Sep 6 10:14:53 2017 +0200

    0029081: With Mingw-w64 Unicode Paths Do Not Work
    
    Added boost as an optional external library.
(0070281)
git   
2017-09-06 17:10   
Branch CR29081 has been updated by Zia ul Azam.

SHA-1: 0e2fade9099d4302f1208b4d7fa2d3ab888da539


Detailed log of new commits:

Author: Zia ul Azam
Date: Wed Sep 6 16:09:32 2017 +0200

    0029081: With Mingw-w64 Unicode Paths Do Not Work
    
    Added documentation of cmake flags used for using and installing boost.

(0070282)
git   
2017-09-06 17:15   
Branch CR29081 has been updated by BenjaminBihler.

SHA-1: deb861a0d8033c44fbbe4ca9dda70db60359c94a


Detailed log of new commits:

Author: Benjamin Bihler
Date: Wed Sep 6 16:14:47 2017 +0200

    0029081: With Mingw-w64 Unicode Paths Do Not Work
    
    Corrected boost header file path.

(0070283)
git   
2017-09-06 17:20   
Branch CR29081 has been updated by BenjaminBihler.

SHA-1: 6f965ae8ee9e552f8f4badccb135cbdc0eced6af


Detailed log of new commits:

Author: Benjamin Bihler
Date: Wed Sep 6 16:20:38 2017 +0200

    0029081: With Mingw-w64 Unicode Paths Do Not Work
    
    Harmonized capitalization.

(0071212)
abv   
2017-10-04 22:06   
(edited on: 2017-10-05 06:41)
Hello Benjamin and Zia,

Sorry for delay with reply on this issue.

I have tried to apply these changes and, alas, I cannot make it working.

What I have is:
- Windows 10 Pro 64-bit workstation
- MinGW-W64-builds-4.3.3 downloaded from https://netcologne.dl.sourceforge.net/project/mingw-w64/Toolchains%20targetting%20Win64/Personal%20Builds/mingw-builds/7.1.0/threads-win32/sjlj/x86_64-7.1.0-release-win32-sjlj-rt_v5-rev2.7z [^]
- Boost 1.65.1 downloaded from http://www.boost.org/users/download/ [^] and built with MinGW (target "gcc") using instructions found at http://www.boost.org/doc/libs/1_65_1/more/getting_started/windows.html#or-build-binaries-from-source [^]
- OCCT current master + fix (branch CR29081_1) built using CMake with generator "MinGW Makefiles"

The problem is that Boost streams do not work correctly under MinGW with neither char* (UTF-8) nor wchar_t* (UTF-16) names. At the end, I reduced the problem to this code:

~~~~~
  boost::filesystem::ofstream test1("d:\\\xE6\x9C\x89\xE7\x94\xA8.var1", ios::out);
  if (!test1.rdbuf()->is_open()) { std::cout << "Variant 1 failed" << std::endl; }

  boost::filesystem::ofstream test2(L"d:\\\x6709\x7528.var2", ios::out);
  if (!test2.rdbuf()->is_open()) { std::cout << "Variant 2 failed" << std::endl; }
~~~~~

Both file names represent text "it works" translated to Traditional Chinese using Google.Translate (two hieroglyphs: 有用), first one in UTF-8 and another in UTF-16.

When executed, file opening fails in variant 2; variant 1 produces file with name with UTF-8 string apparently being interpreted as if it were in the current locale.

If I set Boost locale to UTF-8, then variant 2 works just like variant 1.
For setting locale, I tried two variants (seemingly equivalent) found in:
http://www.boost.org/doc/libs/1_62_0/libs/locale/doc/html/default_encoding_under_windows.html [^]
https://svn.boost.org/trac10/ticket/9968 [^]

Note that the same code compiled with MSVC 10 using standard streams (i.e. replacing "boost::filesystem" with "std") produces file with the same wrong name as with MinGW in variant 1 (as expected), but produces file with expected correct name "有用.var2" in variant 2.

Thus, from my testing, Boost built for MinGW does not seem to support Unicode path names.

If it works for you, this should be either due to different build of Boost, or some tricks necessary to set it working that I do not know. Please share your knowledge on this.

(0071214)
git   
2017-10-04 22:11   
Branch CR29081_1 has been created by abv.

SHA-1: 8acc16a53934d6b96850968915d710314261a3d8


Detailed log of new commits:

Author: Zia ul Azam
Date: Wed Sep 6 11:14:53 2017 +0300

    0029081: With Mingw-w64 Unicode Paths Do Not Work
    
    Added boost as an optional external library.
    Added documentation of cmake flags used for using and installing boost.
    
    Boost file streams are used on Windows if boost is present to enable handling unicode paths also with Mingw-w64.
(0071216)
abv   
2017-10-04 22:16   
Branch CR29081_1 is the same as CR29081, rebased on current master and with all commits suashed (plus some corrections for building DRAW).

Note that as soon as Boost headers and relevant logic is present in OSD_OpenFile.hxx, there is no need to replicate it whenever streams are used. Instead, we can define typedefs for different kinds of streams used (e.g. Standard_OFStream for either std::ofstream or boost::filesystem::OFStream) in this header, and use them throughout the code. In this case OSD_OpenFile.hxx will be the single place where Boost is mentioned.
(0071230)
BenjaminBihler   
2017-10-05 12:54   
Hello Andrey,

unfortunately you are right. :-((( I have found this issue: https://svn.boost.org/trac10/ticket/5769 [^] which seems to explain everything. Our solution seems to have worked for us since we have used special characters that are part of our current Windows code page. So the patch is partially beneficial (without Boost it was not even possible to read/write such files), but even with Boost not all unicode paths work with MinGW (they do work with Boost and MSVC).

How to continue? The current patch is still useful for us, but it is not what it has promised to be.

Benjamin
(0071231)
kgv   
2017-10-05 13:04   
(edited on: 2017-10-05 13:06)
> Our solution seems to have worked for us since we have used special characters
> that are part of our current Windows code page.
> So the patch is partially beneficial
> (without Boost it was not even possible to read/write such files),
Working with active CodePage does not require boost - the same could be achieved by patching OSD_OpenFile to convert UTF-8 to CodePage specifically for MinGW
(I think I have written about this workaround somewhere else, but maybe not).
And for opening existing files, one may also use workaround with passing short DOS file path instead of full path (this can be also done using WinAPI functions) - short file names are still generated even on modern Windows systems for NTFS filesystem (can be disabled in system settings).

(0071243)
BenjaminBihler   
2017-10-05 15:16   
We will look at two other libraries that might offer a solution: https://pocoproject.org/ [^] (Boost Software License) and https://www.guelkerdev.de/projects/pathie/ [^] (GPL).

Or are you already sure that you wouln't accept another optional library dependency even if it was a real solution for the unicode path problem?
(0071246)
git   
2017-10-05 18:27   
Branch CR29081_2 has been created by kgv.

SHA-1: e53996ab1c6ca311ca0eb21096964b29f852ed26


Detailed log of new commits:

Author: kgv
Date: Wed Sep 6 11:14:53 2017 +0300

    0029081: With Mingw-w64 Unicode Paths Do Not Work
    
    Define opencascade::std::fstream implementing wchar_t paths support
    on Mingw using __gnu_cxx::stdio_filebuf.
(0071247)
git   
2017-10-05 20:03   
Branch CR29081_3 has been created by kgv.

SHA-1: 788adcd33946423e199f28ba92bfb24b7146f6f7


Detailed log of new commits:

Author: kgv
Date: Wed Sep 6 11:14:53 2017 +0300

    0029081: With Mingw-w64 Unicode Paths Do Not Work
    
    OSD_OpenStream now uses __gnu_cxx::stdio_filebuf extension
    for opening UNICODE files on MinGW when using C++ file streams.
(0071248)
kgv   
2017-10-05 20:04   
Could you please try patch in branch CR29081_3?
(0071249)
abv   
2017-10-05 21:02   
For me, it works like a charm.
(0071250)
git   
2017-10-06 08:12   
Branch CR29081_3 has been updated forcibly by kgv.

SHA-1: 83b2925673dd26295234f4e3e86197bb66091077
(0071253)
BenjaminBihler   
2017-10-06 11:13   
Kirill, you are brilliant! I cannot see any limitations, it's just great! :-)
(0071274)
git   
2017-10-06 19:58   
Branch CR29081_4 has been created by abv.

SHA-1: 3217b03a542e91d8fdbba357ae522bf9dac37911


Detailed log of new commits:

Author: kgv
Date: Wed Sep 6 11:14:53 2017 +0300

    0029081: With Mingw-w64 Unicode Paths Do Not Work
    
    OSD_OpenStream() now uses __gnu_cxx::stdio_filebuf extension for opening UNICODE files on MinGW when using C++ file streams.
    Variant accepting filebuf returns bool (true if succeeded and false otherwise).
    
    Checks of ofstream to be opened made via calls to low-level ofstream::rdbuf() are replaced by calls to ofstream::is_open(); state of the stream is also checked (to be good).
    Unicode name used for test file in test bugs fclasses bug22125 is described (for possibility to check it).
(0071275)
git   
2017-10-06 20:49   
Branch CR29081_4 has been updated forcibly by abv.

SHA-1: fc8918ad91d704b213a4c389e3640bc5626d98cf
(0071276)
abv   
2017-10-07 07:45   
Reviewed with some amendments (for more consistent handling of possible errors) and tested; see Jenkins job CR29081-master-abv. Please consider branch CR29081_4 for integration.
(0071291)
bugmaster   
2017-10-09 10:25   
Combination -
OCCT branch : CR29081_4 SHA-1: fc8918ad91d704b213a4c389e3640bc5626d98cf
Products branch : master
was compiled on Linux, MacOS and Windows platforms and tested on optimize mode.

Number of compiler warnings:
No new/fixed warnings

Regressions/Differences/Improvements:
No regressions/differences

CPU differences:
No differences that require special attention

Image differences :
No differences that require special attention

Memory differences :
No differences that require special attention
(0071449)
git   
2017-10-14 12:19   
Branch CR29081 has been deleted by kgv.

SHA-1: 6f965ae8ee9e552f8f4badccb135cbdc0eced6af
(0071450)
git   
2017-10-14 12:19   
Branch CR29081_1 has been deleted by kgv.

SHA-1: 8acc16a53934d6b96850968915d710314261a3d8
(0071451)
git   
2017-10-14 12:19   
Branch CR29081_2 has been deleted by kgv.

SHA-1: e53996ab1c6ca311ca0eb21096964b29f852ed26
(0071452)
git   
2017-10-14 12:19   
Branch CR29081_3 has been deleted by kgv.

SHA-1: 83b2925673dd26295234f4e3e86197bb66091077
(0071453)
git   
2017-10-14 12:19   
Branch CR29081_4 has been deleted by kgv.

SHA-1: fc8918ad91d704b213a4c389e3640bc5626d98cf