MantisBT
Mantis Bug Tracker Workflow

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0028454Community[OCCT] OCCT:Data Exchangepublic2017-02-13 15:212018-10-17 12:44
ReporterBenjaminBihler 
Assigned Togka 
PrioritynormalSeverityminor 
StatusassignedResolutionopen 
PlatformWindowsOSVC++ 2015OS Version64 bit
Product Version[OCCT] 7.2.0 
Target Version[OCCT] 7.4.0*Fixed in Version 
Summary0028454: Names with Special Characters Cannot Be Read from STEP or IGES Files
DescriptionIf I create a wire in CATIA with the name "AaBbCcÄäÖöÜüß*,.-;:_" and export it to STEP and IGES, then the name is encoded as "AaBbCc\X2\00C4\X0\\X2\00E4\X0\\X2\00D6\X0\\X2\00F6\X0\\X2\00DC\X0\\X2\00FC\X0\\X2\00DF\X0\*,.-;:_" in the STEP file and as "AaBbCc__OoUu_*,.-;:_".

So the special characters are lost in the IGES format, but in the STEP format they are preserved and when CATIA reads in the STEP file, the correct special characters appear in the wire name.

This does not work with OCCT. If I import the STEP file there the encoded (escaped) special characters appear in the StepRepr_RepresentationItem.

Is the CATIA encoding standard? Should OCCT therefore decode the special characters? Or is there no standard way of storing special character names to a STEP file? What about IGES?
TagsNo tags attached.
Test case number
Attached Files? file icon SpecialCharacterNameInside.stp (3,643 bytes) 2017-02-13 15:23
? file icon SpecialCharacterNameInside.igs (1,053 bytes) 2017-02-13 15:23

- Relationships
child of 0022484closedbugmaster Open CASCADE UNICODE characters support. 

-  Notes
(0073643)
kgv (developer)
2018-01-29 12:30

From what I've found on the web:

the character set for the exchange structure is defined as the code points U+0020 to U+007E and U+0080 
to U+10FFFF of ISO 10646 (Unicode). The first range includes: digits, upper and lower case "latin" 
letters, and common special characters (roughly equivalent to ASCII). The 2016 version of ISO 10303 extended 
the permitted "alphabet" to include "high" codepoints U+0080 to U+10FFFF, using UTF-8 
encoding. For compatibility with the 2002 version, high codepoint characters can be encoded/escaped within 
"control directives" (/X2/, /X4/, and /X0/)

so it seems that within '2016 version the text can be just stored in UTF-8 (which is theoretically what OCCT currently should work as is), while for compatibility with older versions UNICODE symbols can be encoded with control directives.

- Issue History
Date Modified Username Field Change
2017-02-13 15:21 BenjaminBihler New Issue
2017-02-13 15:21 BenjaminBihler Assigned To => gka
2017-02-13 15:23 BenjaminBihler File Added: SpecialCharacterNameInside.stp
2017-02-13 15:23 BenjaminBihler File Added: SpecialCharacterNameInside.igs
2017-02-13 17:15 kgv Relationship added child of 0022484
2017-04-04 16:35 gka Assigned To gka => imn
2017-04-04 16:35 gka Status new => assigned
2017-07-27 11:15 abv Target Version 7.2.0 => 7.4.0*
2018-01-29 12:30 kgv Note Added: 0073643
2018-01-29 12:31 kgv Relationship added related to 0025440
2018-01-29 12:32 kgv Relationship added related to 0029458
2018-06-08 14:22 kgv Assigned To imn => gka
2018-10-17 12:44 kgv Relationship added related to 0030245


Copyright © 2000 - 2018 MantisBT Team
Powered by Mantis Bugtracker