View Issue Details

IDProjectCategoryView StatusLast Update
0028454CommunityOCCT:Data Exchangepublic2021-10-11 12:10
ReporterBenjaminBihler Assigned Tobugmaster  
PrioritynormalSeverityminor 
Status closedResolutionfixed 
PlatformWindowsOSVC++ 2015 
Product Version7.2.0 
Target Version7.5.0Fixed in Version7.5.0 
Summary0028454: Data Exchange, STEP reader - names with special characters cannot be read
DescriptionIf I create a wire in CATIA with the name "AaBbCcÄäÖöÜüß*,.-;:_" and export it to STEP and IGES, then the name is encoded as "AaBbCc\X2\00C4\X0\\X2\00E4\X0\\X2\00D6\X0\\X2\00F6\X0\\X2\00DC\X0\\X2\00FC\X0\\X2\00DF\X0\*,.-;:_" in the STEP file and as "AaBbCc__OoUu_*,.-;:_".

So the special characters are lost in the IGES format, but in the STEP format they are preserved and when CATIA reads in the STEP file, the correct special characters appear in the wire name.

This does not work with OCCT. If I import the STEP file there the encoded (escaped) special characters appear in the StepRepr_RepresentationItem.

Is the CATIA encoding standard? Should OCCT therefore decode the special characters? Or is there no standard way of storing special character names to a STEP file? What about IGES?
Steps To ReproduceTest cases:
 - bug28454_1
 - bug28454_2
TagsNo tags attached.
Test case numberbugs/step/bug28454_1,bug28454_2

Attached Files

  • SpecialCharacterNameInside.stp (3,643 bytes)
  • SpecialCharacterNameInside.igs (1,053 bytes)

Relationships

parent of 0031884 closedbugmaster Open CASCADE Data Exchange - NULL de-reference within STEPCAFControl_Reader::SetSourceCodePage() 
parent of 0032310 closedsmoskvin Open CASCADE Data Exchange - Invalid STEP export/import of backslashes in names [Regression since OCCT 7.5.0] 
related to 0031670 closedbugmaster Community Data Exchange - cp1251 Cyrillic characters in STEP file 
related to 0031851 closedbugmaster Open CASCADE Data Exchange, STEP - enable Unicode symbols in STEP export 
child of 0022484 closedbugmaster Open CASCADE UNICODE characters support. 

Activities

BenjaminBihler

2017-02-13 15:23

developer  

SpecialCharacterNameInside.stp (3,643 bytes)

BenjaminBihler

2017-02-13 15:23

developer  

SpecialCharacterNameInside.igs (1,053 bytes)

kgv

2018-01-29 12:30

developer   ~0073643

From what I've found on the web:

the character set for the exchange structure is defined as the code points U+0020 to U+007E and U+0080 to U+10FFFF of ISO 10646 (Unicode). The first range includes: digits, upper and lower case "latin" letters, and common special characters (roughly equivalent to ASCII). The 2016 version of ISO 10303 extended the permitted "alphabet" to include "high" codepoints U+0080 to U+10FFFF, using UTF-8 encoding. For compatibility with the 2002 version, high codepoint characters can be encoded/escaped within "control directives" (/X2/, /X4/, and /X0/)

so it seems that within '2016 version the text can be just stored in UTF-8 (which is theoretically what OCCT currently should work as is), while for compatibility with older versions UNICODE symbols can be encoded with control directives.

abv

2020-10-06 20:59

manager   ~0095764

I propose we should add at least decoding functionality, to convert Unicode control directives to UTF-16 when putting strings to XDE.

See http://www.steptools.com/stds/step/IS_final_p21e3.html#clause-6-4-3 for documentation of string encoding in STEP.

abv

2020-10-06 22:08

manager   ~0095765

IGES format does not support Unicode (and non-ASCII) strings at all, see
https://filemonger.com/specs/igs/devdept.com/version6.pdf

When exporting to IGES, OCCT translator replaces any non-Ascii character by underscore. This way it is protected, so nothing to be done for IGES.

git

2020-10-13 13:33

administrator   ~0095927

Branch CR28454 has been created by dpasukhi.

SHA-1: c163ff39d23578fc78e962a62129f7b2522749fe


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files
    
    Add support of the control directives ( "\X2\" "\X4" "\X\" "\PN\" "\S\");
    Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    Rename "read.stepcaf.codepage" to "read.step.codepage".

git

2020-10-14 18:01

administrator   ~0095979

Branch CR28454_1 has been created by dpasukhi.

SHA-1: 35a9438501e822c0ec642335018362a0cd963d86


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files
    
    Add support of the control directives ( "\X2\" "\X4" "\X\" "\PN\" "\S\");
    Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    Rename "read.stepcaf.codepage" to "read.step.codepage".

git

2020-10-19 12:43

administrator   ~0096056

Branch CR28454_2 has been created by dpasukhi.

SHA-1: 07778bd06b610386ed781ebac7367dee7650d8d8


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files
    
    Add support of the control directives ( "\X2\" "\X4" "\X\" "\PN\" "\S\");
    Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    Rename "read.stepcaf.codepage" to "read.step.codepage".

git

2020-10-19 21:30

administrator   ~0096064

Branch CR28454_2 has been updated by abv.

SHA-1: bcd2801660d88af57351f8115a51cb97a655423e


Detailed log of new commits:

Author: abv
Date: Mon Oct 19 21:33:37 2020 +0300

    # Minor corrections:
    
    - Resource_CodePages.pxx: specify array size explicitly
    - Resource_FormatType.hxx: comments
    - Resource_Unicode.cxx: make code safer against signed chars; map zero ExtChar to '0'
    - STEPCAFControl_Reader: conversion simplified
    - StepData_StepReaderData: avoid compiler warnings

git

2020-10-21 10:12

administrator   ~0096089

Branch CR28454_3 has been created by dpasukhi.

SHA-1: e7446d2f913ea386dc20f9acbec3d162e7756f28


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange, STEP reader - names with special characters cannot be read
    
    Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\");
    Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    Rename "read.stepcaf.codepage" to "read.step.codepage".
    Add ISO 8859-1 - 9 code pages for conversion
    Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior

git

2020-10-21 11:39

administrator   ~0096096

Branch CR28454_3 has been updated forcibly by dpasukhi.

SHA-1: 45c9532a0825c05bc9b752cac5059c4ebaf4bc88

git

2020-10-21 16:23

administrator   ~0096113

Branch CR28454_4 has been created by dpasukhi.

SHA-1: 8b2c43b0b02ec8fd1c9e7dcb40d24705367827ab


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange, STEP reader - names with special characters cannot be read
    
    Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\");
    Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    Rename "read.stepcaf.codepage" to "read.step.codepage".
    Add ISO 8859-1 - 9 code pages for conversion
    Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior
    Update old test cases that contain control directives

dpasukhi

2020-10-21 18:59

developer   ~0096119

Dear kgv,
please review CR28454_4.
ALL test are ok,
see http://vm-jenkins-test-12.nnov.opencascade.com:8080/view/CR28454-master-dpasukhi/

kgv

2020-10-21 19:57

developer   ~0096121

+    Interface_Static::Init("step", "read.step.codepage", '&', "eval ANSI");         // Resource_FormatType_ANSI
...
+    Interface_Static::Init("step", "read.step.codepage", '&', "eval NoConversion"); // Resource_FormatType_NoConversion

These two values have the same definition in Resource_FormatType enumeration - how this definition is supposed to work?

+void TCollection_ExtendedString::AssignCat(const Standard_Utf16Char other)

theChar

+  Standard_EXPORT void CleanText(const Handle(TCollection_HAsciiString)& val) const;

cleanText (const Handle(TCollection_HAsciiString)& theVal)

+  //! Initialized from "read.stepcaf.codepage" variable by constructor, which is Resource_UTF8 by default.
+  Resource_FormatType SourceCodePage() const { return mySourceCodePage; }

Method description looks outdated - it points to non-existing parameter name.
Please update also other places using the old name.

git

2020-10-21 20:40

administrator   ~0096122

Branch CR28454_4 has been updated by dpasukhi.

SHA-1: 19870a79ad44955efcfd46ae1308682e160dc362


Detailed log of new commits:

Author: dpasukhi
Date: Wed Oct 21 20:41:09 2020 +0300

    0028454: Data Exchange, STEP reader - names with special characters cannot be read
    
    # Done remarks
      Fix definition in Resource_FormatType enumeration within STEPControl_Controller
      Styling variable names
      Update description contains old "read.stepcaf.codepage"

git

2020-10-21 22:44

administrator   ~0096124

Branch CR28454_4 has been updated by abv.

SHA-1: 131e7a618af65841f16f4d4fb1813fe7dc57e28b


Detailed log of new commits:

Author: abv
Date: Wed Oct 21 22:48:19 2020 +0300

    # minor corrections: warning messages are recorded in Interface_Check instead of output to Message_Messenger

git

2020-10-22 10:11

administrator   ~0096127

Branch CR28454_5 has been created by dpasukhi.

SHA-1: a2b0388ac18083c0dbad5d86ea40a2d2aaf1dcb6


Detailed log of new commits:

Author: dpasukhi
Date: Fri Oct 9 13:57:30 2020 +0300

    0028454: Data Exchange, STEP reader - names with special characters cannot be read
    
    - Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\");
    - Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
    - Rename "read.stepcaf.codepage" to "read.step.codepage".
    - Add ISO 8859-1 - 9 code pages for conversion
    - Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior
    - Update old test cases that contain control directives

dpasukhi

2020-10-22 14:12

developer   ~0096136

Dear kgv,
All remarks have been done.
Please review CR28454_5.
All tests are OK, NO regressions, see
http://vm-jenkins-test-12.nnov.opencascade.com:8080/view/CR28454-master-dpasukhi/

bugmaster

2020-10-24 12:19

administrator   ~0096194

Combination -
OCCT branch : IR-2020-10-23
master SHA - 9f9490e1ae0eaf38507437019a117437c6317225
a206de37fbfa0bf71bd534ae47192bbec23b8522
Products branch : IR-2020-10-23 SHA - 4594c3ef5cc6ec5816231ba88e2a4863a25a06d2
was compiled on Linux, MacOS and Windows platforms and tested in optimize mode.

Number of compiler warnings:
No new/fixed warnings

Regressions/Differences/Improvements:
No regressions/differences

CPU differences:
Debian80-64:
OCCT
Total CPU difference: 18000.12000000008 / 18036.38000000013 [-0.20%]
Products
Total CPU difference: 12171.670000000115 / 12174.520000000093 [-0.02%]
Windows-64-VC14:
OCCT
Total CPU difference: 19723.78125 / 19746.3125 [-0.11%]
Products
Total CPU difference: 13538.390625 / 13565.046875 [-0.20%]


Image differences :
No differences that require special attention

Memory differences :
No differences that require special attention

git

2020-10-24 12:41

administrator   ~0096208

Branch CR28454_5 has been deleted by inv.

SHA-1: a2b0388ac18083c0dbad5d86ea40a2d2aaf1dcb6

git

2020-10-24 12:41

administrator   ~0096209

Branch CR28454_4 has been deleted by inv.

SHA-1: 131e7a618af65841f16f4d4fb1813fe7dc57e28b

git

2020-10-24 12:41

administrator   ~0096213

Branch CR28454_3 has been deleted by inv.

SHA-1: 45c9532a0825c05bc9b752cac5059c4ebaf4bc88

git

2020-10-24 12:41

administrator   ~0096215

Branch CR28454_2 has been deleted by inv.

SHA-1: bcd2801660d88af57351f8115a51cb97a655423e

git

2020-10-24 12:41

administrator   ~0096221

Branch CR28454_1 has been deleted by inv.

SHA-1: 35a9438501e822c0ec642335018362a0cd963d86

git

2020-10-24 12:41

administrator   ~0096227

Branch CR28454 has been deleted by inv.

SHA-1: c163ff39d23578fc78e962a62129f7b2522749fe

Related Changesets

occt: master 1b9cb073

2020-10-09 10:57:30

dpasukhi


Committer: bugmaster Details Diff
0028454: Data Exchange, STEP reader - names with special characters cannot be read

- Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\");
- Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF;
- Rename "read.stepcaf.codepage" to "read.step.codepage".
- Add ISO 8859-1 - 9 code pages for conversion
- Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior
- Update old test cases that contain control directives
Affected Issues
0028454
mod - src/Resource/FILES Diff File
rm - src/Resource/Resource_ANSI.pxx Diff File
add - src/Resource/Resource_CodePages.pxx Diff File
mod - src/Resource/Resource_FormatType.hxx Diff File
mod - src/Resource/Resource_Unicode.cxx Diff File
mod - src/STEPCAFControl/STEPCAFControl_Controller.cxx Diff File
mod - src/STEPCAFControl/STEPCAFControl_Reader.cxx Diff File
mod - src/STEPCAFControl/STEPCAFControl_Reader.hxx Diff File
mod - src/STEPControl/STEPControl_Controller.cxx Diff File
mod - src/StepData/StepData_StepModel.cxx Diff File
mod - src/StepData/StepData_StepModel.hxx Diff File
mod - src/StepData/StepData_StepReaderData.cxx Diff File
mod - src/StepData/StepData_StepReaderData.hxx Diff File
mod - src/StepFile/StepFile_Read.cxx Diff File
mod - src/TCollection/TCollection_ExtendedString.cxx Diff File
mod - src/TCollection/TCollection_ExtendedString.hxx Diff File
add - tests/bugs/step/bug28454_1 Diff File
add - tests/bugs/step/bug28454_2 Diff File
mod - tests/bugs/step/bug30694 Diff File
mod - tests/bugs/step/bug31670 Diff File
mod - tests/bugs/step/bug31670_1 Diff File
mod - tests/gdt/view/B4 Diff File
mod - tests/gdt/view/B7 Diff File

Issue History

Date Modified Username Field Change
2017-02-13 15:21 BenjaminBihler New Issue
2017-02-13 15:21 BenjaminBihler Assigned To => gka
2017-02-13 15:23 BenjaminBihler File Added: SpecialCharacterNameInside.stp
2017-02-13 15:23 BenjaminBihler File Added: SpecialCharacterNameInside.igs
2017-02-13 17:15 kgv Relationship added child of 0022484
2017-04-04 16:35 gka Assigned To gka => imn
2017-04-04 16:35 gka Status new => assigned
2017-07-27 11:15 abv Target Version 7.2.0 => 7.4.0
2018-01-29 12:30 kgv Note Added: 0073643
2018-06-08 14:22 kgv Assigned To imn => gka
2019-06-24 09:19 kgv Summary Names with Special Characters Cannot Be Read from STEP or IGES Files => Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files
2019-09-12 14:54 gka Target Version 7.4.0 => 7.5.0
2020-07-24 12:55 kgv Relationship added related to 0031670
2020-09-10 15:39 gka Assigned To gka => dpasukhi
2020-09-22 18:19 szy Target Version 7.5.0 => 7.6.0
2020-10-06 20:59 abv Note Added: 0095764
2020-10-06 20:59 abv Target Version 7.6.0 => 7.5.0
2020-10-06 22:08 abv Note Added: 0095765
2020-10-13 13:33 git Note Added: 0095927
2020-10-14 08:54 abv Relationship added related to 0031851
2020-10-14 18:01 git Note Added: 0095979
2020-10-19 12:43 git Note Added: 0096056
2020-10-19 13:58 kgv Summary Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files => Data Exchange, STEP reader - names with special characters cannot be read
2020-10-19 21:30 git Note Added: 0096064
2020-10-21 10:12 git Note Added: 0096089
2020-10-21 11:39 git Note Added: 0096096
2020-10-21 16:23 git Note Added: 0096113
2020-10-21 18:59 dpasukhi Note Added: 0096119
2020-10-21 18:59 dpasukhi Assigned To dpasukhi => kgv
2020-10-21 18:59 dpasukhi Status assigned => resolved
2020-10-21 18:59 dpasukhi Steps to Reproduce Updated
2020-10-21 19:57 kgv Note Added: 0096121
2020-10-21 19:58 kgv Assigned To kgv => dpasukhi
2020-10-21 19:58 kgv Status resolved => assigned
2020-10-21 20:40 git Note Added: 0096122
2020-10-21 22:44 git Note Added: 0096124
2020-10-22 10:11 git Note Added: 0096127
2020-10-22 14:12 dpasukhi Note Added: 0096136
2020-10-22 14:12 dpasukhi Assigned To dpasukhi => kgv
2020-10-22 14:12 dpasukhi Status assigned => resolved
2020-10-22 14:25 kgv Assigned To kgv => bugmaster
2020-10-22 14:25 kgv Status resolved => reviewed
2020-10-24 12:19 bugmaster Note Added: 0096194
2020-10-24 12:19 bugmaster Status reviewed => tested
2020-10-24 12:25 bugmaster Test case number => bugs/step/bug28454_1,bug28454_2
2020-10-24 12:30 bugmaster Changeset attached => occt master 1b9cb073
2020-10-24 12:30 bugmaster Status tested => verified
2020-10-24 12:30 bugmaster Resolution open => fixed
2020-10-24 12:41 git Note Added: 0096208
2020-10-24 12:41 git Note Added: 0096209
2020-10-24 12:41 git Note Added: 0096213
2020-10-24 12:41 git Note Added: 0096215
2020-10-24 12:41 git Note Added: 0096221
2020-10-24 12:41 git Note Added: 0096227
2020-10-27 11:34 kgv Relationship added parent of 0031884
2020-12-02 16:22 emo Fixed in Version => 7.5.0
2020-12-02 17:11 emo Status verified => closed
2021-10-11 12:10 dpasukhi Relationship added parent of 0032310