MantisBT
Mantis Bug Tracker Workflow

View Issue Details Jump to Notes ] Related Changesets ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0028454Community[OCCT] OCCT:Data Exchangepublic2017-02-13 15:212020-10-27 11:34
ReporterBenjaminBihler 
Assigned Tobugmaster 
PrioritynormalSeverityminor 
StatusverifiedResolutionfixed 
PlatformWindowsOSVC++ 2015OS Version64 bit
Product Version[OCCT] 7.2.0 
Target Version[OCCT] 7.5.0Fixed in Version 
Summary0028454: Data Exchange, STEP reader - names with special characters cannot be read
DescriptionIf I create a wire in CATIA with the name "AaBbCcÄäÖöÜüß*,.-;:_" and export it to STEP and IGES, then the name is encoded as "AaBbCc\X2\00C4\X0\\X2\00E4\X0\\X2\00D6\X0\\X2\00F6\X0\\X2\00DC\X0\\X2\00FC\X0\\X2\00DF\X0\*,.-;:_" in the STEP file and as "AaBbCc__OoUu_*,.-;:_".

So the special characters are lost in the IGES format, but in the STEP format they are preserved and when CATIA reads in the STEP file, the correct special characters appear in the wire name.

This does not work with OCCT. If I import the STEP file there the encoded (escaped) special characters appear in the StepRepr_RepresentationItem.

Is the CATIA encoding standard? Should OCCT therefore decode the special characters? Or is there no standard way of storing special character names to a STEP file? What about IGES?
Steps To ReproduceTest cases:
 - bug28454_1
 - bug28454_2
TagsNo tags attached.
Test case numberbugs/step/bug28454_1,bug28454_2
Attached Files? file icon SpecialCharacterNameInside.stp (3,643 bytes) 2017-02-13 15:23
? file icon SpecialCharacterNameInside.igs (1,053 bytes) 2017-02-13 15:23

- Relationships
parent of 0031884reviewedbugmaster Open CASCADE Data Exchange - NULL de-reference within STEPCAFControl_Reader::SetSourceCodePage() 
related to 0031670verifiedbugmaster Community Data Exchange - cp1251 Cyrillic characters in STEP file 
related to 0031851verifiedbugmaster Open CASCADE Data Exchange, STEP - enable Unicode symbols in STEP export 
child of 0022484closedbugmaster Open CASCADE UNICODE characters support. 
Not all the children of this issue are yet resolved or closed.

-  Notes
(0073643)
kgv (developer)
2018-01-29 12:30

From what I've found on the web:

the character set for the exchange structure is defined as the code points U+0020 to U+007E and U+0080 
to U+10FFFF of ISO 10646 (Unicode). The first range includes: digits, upper and lower case "latin" 
letters, and common special characters (roughly equivalent to ASCII). The 2016 version of ISO 10303 extended 
the permitted "alphabet" to include "high" codepoints U+0080 to U+10FFFF, using UTF-8 
encoding. For compatibility with the 2002 version, high codepoint characters can be encoded/escaped within 
"control directives" (/X2/, /X4/, and /X0/)

so it seems that within '2016 version the text can be just stored in UTF-8 (which is theoretically what OCCT currently should work as is), while for compatibility with older versions UNICODE symbols can be encoded with control directives.
(0095764)
abv (manager)
2020-10-06 20:59

I propose we should add at least decoding functionality, to convert Unicode control directives to UTF-16 when putting strings to XDE.

See http://www.steptools.com/stds/step/IS_final_p21e3.html#clause-6-4-3 [^] for documentation of string encoding in STEP.
(0095765)
abv (manager)
2020-10-06 22:08

IGES format does not support Unicode (and non-ASCII) strings at all, see
https://filemonger.com/specs/igs/devdept.com/version6.pdf [^]

When exporting to IGES, OCCT translator replaces any non-Ascii character by underscore. This way it is protected, so nothing to be done for IGES.
(0095927)
git (administrator)
2020-10-13 13:33

Branch CR28454 has been created by dpasukhi.

SHA-1: c163ff39d23578fc78e962a62129f7b2522749fe


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files
    
    Add support of the control directives ( "\X2\" "\X4" "\X\" "\PN\" "\S\");
    Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    Rename "read.stepcaf.codepage" to "read.step.codepage".
(0095979)
git (administrator)
2020-10-14 18:01

Branch CR28454_1 has been created by dpasukhi.

SHA-1: 35a9438501e822c0ec642335018362a0cd963d86


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files
    
    Add support of the control directives ( "\X2\" "\X4" "\X\" "\PN\" "\S\");
    Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    Rename "read.stepcaf.codepage" to "read.step.codepage".
(0096056)
git (administrator)
2020-10-19 12:43

Branch CR28454_2 has been created by dpasukhi.

SHA-1: 07778bd06b610386ed781ebac7367dee7650d8d8


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files
    
    Add support of the control directives ( "\X2\" "\X4" "\X\" "\PN\" "\S\");
    Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    Rename "read.stepcaf.codepage" to "read.step.codepage".
(0096064)
git (administrator)
2020-10-19 21:30

Branch CR28454_2 has been updated by abv.

SHA-1: bcd2801660d88af57351f8115a51cb97a655423e


Detailed log of new commits:

Author: abv
Date: Mon Oct 19 21:33:37 2020 +0300

    # Minor corrections:
    
    - Resource_CodePages.pxx: specify array size explicitly
    - Resource_FormatType.hxx: comments
    - Resource_Unicode.cxx: make code safer against signed chars; map zero ExtChar to '0'
    - STEPCAFControl_Reader: conversion simplified
    - StepData_StepReaderData: avoid compiler warnings

(0096089)
git (administrator)
2020-10-21 10:12

Branch CR28454_3 has been created by dpasukhi.

SHA-1: e7446d2f913ea386dc20f9acbec3d162e7756f28


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange, STEP reader - names with special characters cannot be read
    
    Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\");
    Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    Rename "read.stepcaf.codepage" to "read.step.codepage".
    Add ISO 8859-1 - 9 code pages for conversion
    Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior
(0096096)
git (administrator)
2020-10-21 11:39

Branch CR28454_3 has been updated forcibly by dpasukhi.

SHA-1: 45c9532a0825c05bc9b752cac5059c4ebaf4bc88
(0096113)
git (administrator)
2020-10-21 16:23

Branch CR28454_4 has been created by dpasukhi.

SHA-1: 8b2c43b0b02ec8fd1c9e7dcb40d24705367827ab


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange, STEP reader - names with special characters cannot be read
    
    Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\");
    Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    Rename "read.stepcaf.codepage" to "read.step.codepage".
    Add ISO 8859-1 - 9 code pages for conversion
    Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior
    Update old test cases that contain control directives
(0096119)
dpasukhi (developer)
2020-10-21 18:59

Dear kgv,
please review CR28454_4.
ALL test are ok,
see http://vm-jenkins-test-12.nnov.opencascade.com:8080/view/CR28454-master-dpasukhi/ [^]
(0096121)
kgv (developer)
2020-10-21 19:57

+    Interface_Static::Init("step", "read.step.codepage", '&', "eval ANSI");         
// Resource_FormatType_ANSI
...
+    Interface_Static::Init("step", "read.step.codepage", '&', "eval NoConversion"); 
// Resource_FormatType_NoConversion

These two values have the same definition in Resource_FormatType enumeration - how this definition is supposed to work?

+void TCollection_ExtendedString::AssignCat(const Standard_Utf16Char other)

theChar

+  Standard_EXPORT void CleanText(const Handle(TCollection_HAsciiString)& val) const;

cleanText (const Handle(TCollection_HAsciiString)& theVal)

+  //! Initialized from "read.stepcaf.codepage" variable by constructor, which is Resource_UTF8 
by default.
+  Resource_FormatType SourceCodePage() const { return mySourceCodePage; }

Method description looks outdated - it points to non-existing parameter name.
Please update also other places using the old name.
(0096122)
git (administrator)
2020-10-21 20:40

Branch CR28454_4 has been updated by dpasukhi.

SHA-1: 19870a79ad44955efcfd46ae1308682e160dc362


Detailed log of new commits:

Author: dpasukhi
Date: Wed Oct 21 20:41:09 2020 +0300

    0028454: Data Exchange, STEP reader - names with special characters cannot be read
    
    # Done remarks
      Fix definition in Resource_FormatType enumeration within STEPControl_Controller
      Styling variable names
      Update description contains old "read.stepcaf.codepage"

(0096124)
git (administrator)
2020-10-21 22:44

Branch CR28454_4 has been updated by abv.

SHA-1: 131e7a618af65841f16f4d4fb1813fe7dc57e28b


Detailed log of new commits:

Author: abv
Date: Wed Oct 21 22:48:19 2020 +0300

    # minor corrections: warning messages are recorded in Interface_Check instead of output to Message_Messenger

(0096127)
git (administrator)
2020-10-22 10:11

Branch CR28454_5 has been created by dpasukhi.

SHA-1: a2b0388ac18083c0dbad5d86ea40a2d2aaf1dcb6


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange, STEP reader - names with special characters cannot be read
    
    - Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\");
    - Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    - Rename "read.stepcaf.codepage" to "read.step.codepage".
    - Add ISO 8859-1 - 9 code pages for conversion
    - Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior
    - Update old test cases that contain control directives
(0096136)
dpasukhi (developer)
2020-10-22 14:12

Dear kgv,
All remarks have been done.
Please review CR28454_5.
All tests are OK, NO regressions, see
http://vm-jenkins-test-12.nnov.opencascade.com:8080/view/CR28454-master-dpasukhi/ [^]
(0096194)
bugmaster (administrator)
2020-10-24 12:19

Combination -
OCCT branch : IR-2020-10-23
master SHA - 9f9490e1ae0eaf38507437019a117437c6317225
a206de37fbfa0bf71bd534ae47192bbec23b8522
Products branch : IR-2020-10-23 SHA - 4594c3ef5cc6ec5816231ba88e2a4863a25a06d2
was compiled on Linux, MacOS and Windows platforms and tested in optimize mode.

Number of compiler warnings:
No new/fixed warnings

Regressions/Differences/Improvements:
No regressions/differences

CPU differences:
Debian80-64:
OCCT
Total CPU difference: 18000.12000000008 / 18036.38000000013 [-0.20%]
Products
Total CPU difference: 12171.670000000115 / 12174.520000000093 [-0.02%]
Windows-64-VC14:
OCCT
Total CPU difference: 19723.78125 / 19746.3125 [-0.11%]
Products
Total CPU difference: 13538.390625 / 13565.046875 [-0.20%]


Image differences :
No differences that require special attention

Memory differences :
No differences that require special attention
(0096208)
git (administrator)
2020-10-24 12:41

Branch CR28454_5 has been deleted by inv.

SHA-1: a2b0388ac18083c0dbad5d86ea40a2d2aaf1dcb6
(0096209)
git (administrator)
2020-10-24 12:41

Branch CR28454_4 has been deleted by inv.

SHA-1: 131e7a618af65841f16f4d4fb1813fe7dc57e28b
(0096213)
git (administrator)
2020-10-24 12:41

Branch CR28454_3 has been deleted by inv.

SHA-1: 45c9532a0825c05bc9b752cac5059c4ebaf4bc88
(0096215)
git (administrator)
2020-10-24 12:41

Branch CR28454_2 has been deleted by inv.

SHA-1: bcd2801660d88af57351f8115a51cb97a655423e
(0096221)
git (administrator)
2020-10-24 12:41

Branch CR28454_1 has been deleted by inv.

SHA-1: 35a9438501e822c0ec642335018362a0cd963d86
(0096227)
git (administrator)
2020-10-24 12:41

Branch CR28454 has been deleted by inv.

SHA-1: c163ff39d23578fc78e962a62129f7b2522749fe

- Related Changesets
occt: master 1b9cb073
Timestamp: 2020-10-09 10:57:30
Author: dpasukhi
Committer: bugmaster
Details ] Diff ]
0028454: Data Exchange, STEP reader - names with special characters cannot be read

- Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\");
- Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
- Rename "read.stepcaf.codepage" to "read.step.codepage".
- Add ISO 8859-1 - 9 code pages for conversion
- Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior
- Update old test cases that contain control directives
mod - src/Resource/FILES Diff ] File ]
rm - src/Resource/Resource_ANSI.pxx Diff ] File ]
add - src/Resource/Resource_CodePages.pxx Diff ] File ]
mod - src/Resource/Resource_FormatType.hxx Diff ] File ]
mod - src/Resource/Resource_Unicode.cxx Diff ] File ]
mod - src/STEPCAFControl/STEPCAFControl_Controller.cxx Diff ] File ]
mod - src/STEPCAFControl/STEPCAFControl_Reader.cxx Diff ] File ]
mod - src/STEPCAFControl/STEPCAFControl_Reader.hxx Diff ] File ]
mod - src/STEPControl/STEPControl_Controller.cxx Diff ] File ]
mod - src/StepData/StepData_StepModel.cxx Diff ] File ]
mod - src/StepData/StepData_StepModel.hxx Diff ] File ]
mod - src/StepData/StepData_StepReaderData.cxx Diff ] File ]
mod - src/StepData/StepData_StepReaderData.hxx Diff ] File ]
mod - src/StepFile/StepFile_Read.cxx Diff ] File ]
mod - src/TCollection/TCollection_ExtendedString.cxx Diff ] File ]
mod - src/TCollection/TCollection_ExtendedString.hxx Diff ] File ]
add - tests/bugs/step/bug28454_1 Diff ] File ]
add - tests/bugs/step/bug28454_2 Diff ] File ]
mod - tests/bugs/step/bug30694 Diff ] File ]
mod - tests/bugs/step/bug31670 Diff ] File ]
mod - tests/bugs/step/bug31670_1 Diff ] File ]
mod - tests/gdt/view/B4 Diff ] File ]
mod - tests/gdt/view/B7 Diff ] File ]

- Issue History
Date Modified Username Field Change
2017-02-13 15:21 BenjaminBihler New Issue
2017-02-13 15:21 BenjaminBihler Assigned To => gka
2017-02-13 15:23 BenjaminBihler File Added: SpecialCharacterNameInside.stp
2017-02-13 15:23 BenjaminBihler File Added: SpecialCharacterNameInside.igs
2017-02-13 17:15 kgv Relationship added child of 0022484
2017-04-04 16:35 gka Assigned To gka => imn
2017-04-04 16:35 gka Status new => assigned
2017-07-27 11:15 abv Target Version 7.2.0 => 7.4.0
2018-01-29 12:30 kgv Note Added: 0073643
2018-01-29 12:31 kgv Relationship added related to 0025440
2018-01-29 12:32 kgv Relationship added related to 0029458
2018-06-08 14:22 kgv Assigned To imn => gka
2018-10-17 12:44 kgv Relationship added related to 0030245
2019-03-02 22:54 kgv Relationship added related to 0030529
2019-05-06 18:01 kgv Relationship added related to 0030694
2019-06-24 09:19 kgv Summary Names with Special Characters Cannot Be Read from STEP or IGES Files => Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files
2019-09-12 14:54 gka Target Version 7.4.0 => 7.5.0
2020-07-24 12:55 kgv Relationship added related to 0031670
2020-09-10 15:39 gka Assigned To gka => dpasukhi
2020-09-22 18:19 szy Target Version 7.5.0 => 7.6.0*
2020-10-06 20:59 abv Note Added: 0095764
2020-10-06 20:59 abv Target Version 7.6.0* => 7.5.0
2020-10-06 22:08 abv Note Added: 0095765
2020-10-13 13:33 git Note Added: 0095927
2020-10-14 08:54 abv Relationship added related to 0031851
2020-10-14 18:01 git Note Added: 0095979
2020-10-19 12:43 git Note Added: 0096056
2020-10-19 13:58 kgv Summary Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files => Data Exchange, STEP reader - names with special characters cannot be read
2020-10-19 21:30 git Note Added: 0096064
2020-10-21 10:12 git Note Added: 0096089
2020-10-21 11:39 git Note Added: 0096096
2020-10-21 16:23 git Note Added: 0096113
2020-10-21 18:59 dpasukhi Note Added: 0096119
2020-10-21 18:59 dpasukhi Assigned To dpasukhi => kgv
2020-10-21 18:59 dpasukhi Status assigned => resolved
2020-10-21 18:59 dpasukhi Steps to Reproduce Updated View Revisions
2020-10-21 19:57 kgv Note Added: 0096121
2020-10-21 19:58 kgv Assigned To kgv => dpasukhi
2020-10-21 19:58 kgv Status resolved => assigned
2020-10-21 20:40 git Note Added: 0096122
2020-10-21 22:44 git Note Added: 0096124
2020-10-22 10:11 git Note Added: 0096127
2020-10-22 14:12 dpasukhi Note Added: 0096136
2020-10-22 14:12 dpasukhi Assigned To dpasukhi => kgv
2020-10-22 14:12 dpasukhi Status assigned => resolved
2020-10-22 14:25 kgv Assigned To kgv => bugmaster
2020-10-22 14:25 kgv Status resolved => reviewed
2020-10-24 12:19 bugmaster Note Added: 0096194
2020-10-24 12:19 bugmaster Status reviewed => tested
2020-10-24 12:25 bugmaster Test case number => bugs/step/bug28454_1,bug28454_2
2020-10-24 12:30 bugmaster Changeset attached => occt master 1b9cb073
2020-10-24 12:30 bugmaster Status tested => verified
2020-10-24 12:30 bugmaster Resolution open => fixed
2020-10-24 12:41 git Note Added: 0096208
2020-10-24 12:41 git Note Added: 0096209
2020-10-24 12:41 git Note Added: 0096213
2020-10-24 12:41 git Note Added: 0096215
2020-10-24 12:41 git Note Added: 0096221
2020-10-24 12:41 git Note Added: 0096227
2020-10-25 19:03 abv Relationship added related to 0031878
2020-10-27 11:34 kgv Relationship added parent of 0031884


Copyright © 2000 - 2020 MantisBT Team
Powered by Mantis Bugtracker