MantisBT - Community
View Issue Details
0028454Community[OCCT] OCCT:Data Exchangepublic2017-02-13 15:212019-06-24 09:19
BenjaminBihler 
gka 
normalminor 
assignedopen 
WindowsVC++ 201564 bit
[OCCT] 7.2.0 
[OCCT] 7.4.0* 
0028454: Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files
If I create a wire in CATIA with the name "AaBbCcÄäÖöÜüß*,.-;:_" and export it to STEP and IGES, then the name is encoded as "AaBbCc\X2\00C4\X0\\X2\00E4\X0\\X2\00D6\X0\\X2\00F6\X0\\X2\00DC\X0\\X2\00FC\X0\\X2\00DF\X0\*,.-;:_" in the STEP file and as "AaBbCc__OoUu_*,.-;:_".

So the special characters are lost in the IGES format, but in the STEP format they are preserved and when CATIA reads in the STEP file, the correct special characters appear in the wire name.

This does not work with OCCT. If I import the STEP file there the encoded (escaped) special characters appear in the StepRepr_RepresentationItem.

Is the CATIA encoding standard? Should OCCT therefore decode the special characters? Or is there no standard way of storing special character names to a STEP file? What about IGES?
No tags attached.
child of 0022484closed bugmaster Open CASCADE UNICODE characters support. 
? SpecialCharacterNameInside.stp (3,643) 2017-02-13 15:23
https://tracker.dev.opencascade.org/
? SpecialCharacterNameInside.igs (1,053) 2017-02-13 15:23
https://tracker.dev.opencascade.org/
Issue History
2017-02-13 15:21BenjaminBihlerNew Issue
2017-02-13 15:21BenjaminBihlerAssigned To => gka
2017-02-13 15:23BenjaminBihlerFile Added: SpecialCharacterNameInside.stp
2017-02-13 15:23BenjaminBihlerFile Added: SpecialCharacterNameInside.igs
2017-02-13 17:15kgvRelationship addedchild of 0022484
2017-04-04 16:35gkaAssigned Togka => imn
2017-04-04 16:35gkaStatusnew => assigned
2017-07-27 11:15abvTarget Version7.2.0 => 7.4.0*
2018-01-29 12:30kgvNote Added: 0073643
2018-01-29 12:31kgvRelationship addedrelated to 0025440
2018-01-29 12:32kgvRelationship addedrelated to 0029458
2018-06-08 14:22kgvAssigned Toimn => gka
2018-10-17 12:44kgvRelationship addedrelated to 0030245
2019-03-02 22:54kgvRelationship addedrelated to 0030529
2019-05-06 18:01kgvRelationship addedrelated to 0030694
2019-06-24 09:19kgvSummaryNames with Special Characters Cannot Be Read from STEP or IGES Files => Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files

Notes
(0073643)
kgv   
2018-01-29 12:30   
From what I've found on the web:

the character set for the exchange structure is defined as the code points U+0020 to U+007E and U+0080 
to U+10FFFF of ISO 10646 (Unicode). The first range includes: digits, upper and lower case "latin" 
letters, and common special characters (roughly equivalent to ASCII). The 2016 version of ISO 10303 extended 
the permitted "alphabet" to include "high" codepoints U+0080 to U+10FFFF, using UTF-8 
encoding. For compatibility with the 2002 version, high codepoint characters can be encoded/escaped within 
"control directives" (/X2/, /X4/, and /X0/)

so it seems that within '2016 version the text can be just stored in UTF-8 (which is theoretically what OCCT currently should work as is), while for compatibility with older versions UNICODE symbols can be encoded with control directives.