View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0028454 | Community | OCCT:Data Exchange | public | 2017-02-13 15:21 | 2021-10-11 12:10 |
Reporter | BenjaminBihler | Assigned To | bugmaster | ||
Priority | normal | Severity | minor | ||
Status | closed | Resolution | fixed | ||
Platform | Windows | OS | VC++ 2015 | ||
Product Version | 7.2.0 | ||||
Target Version | 7.5.0 | Fixed in Version | 7.5.0 | ||
Summary | 0028454: Data Exchange, STEP reader - names with special characters cannot be read | ||||
Description | If I create a wire in CATIA with the name "AaBbCcÄäÖöÜüß*,.-;:_" and export it to STEP and IGES, then the name is encoded as "AaBbCc\X2\00C4\X0\\X2\00E4\X0\\X2\00D6\X0\\X2\00F6\X0\\X2\00DC\X0\\X2\00FC\X0\\X2\00DF\X0\*,.-;:_" in the STEP file and as "AaBbCc__OoUu_*,.-;:_". So the special characters are lost in the IGES format, but in the STEP format they are preserved and when CATIA reads in the STEP file, the correct special characters appear in the wire name. This does not work with OCCT. If I import the STEP file there the encoded (escaped) special characters appear in the StepRepr_RepresentationItem. Is the CATIA encoding standard? Should OCCT therefore decode the special characters? Or is there no standard way of storing special character names to a STEP file? What about IGES? | ||||
Steps To Reproduce | Test cases: - bug28454_1 - bug28454_2 | ||||
Tags | No tags attached. | ||||
Test case number | bugs/step/bug28454_1,bug28454_2 | ||||
parent of | 0031884 | closed | bugmaster | Open CASCADE | Data Exchange - NULL de-reference within STEPCAFControl_Reader::SetSourceCodePage() |
parent of | 0032310 | closed | Open CASCADE | Data Exchange - Invalid STEP export/import of backslashes in names [Regression since OCCT 7.5.0] | |
related to | 0031670 | closed | bugmaster | Community | Data Exchange - cp1251 Cyrillic characters in STEP file |
related to | 0031851 | closed | bugmaster | Open CASCADE | Data Exchange, STEP - enable Unicode symbols in STEP export |
child of | 0022484 | closed | bugmaster | Open CASCADE | UNICODE characters support. |
|
SpecialCharacterNameInside.stp (3,643 bytes) |
|
SpecialCharacterNameInside.igs (1,053 bytes) |
|
From what I've found on the web:the character set for the exchange structure is defined as the code points U+0020 to U+007E and U+0080 to U+10FFFF of ISO 10646 (Unicode). The first range includes: digits, upper and lower case "latin" letters, and common special characters (roughly equivalent to ASCII). The 2016 version of ISO 10303 extended the permitted "alphabet" to include "high" codepoints U+0080 to U+10FFFF, using UTF-8 encoding. For compatibility with the 2002 version, high codepoint characters can be encoded/escaped within "control directives" (/X2/, /X4/, and /X0/) so it seems that within '2016 version the text can be just stored in UTF-8 (which is theoretically what OCCT currently should work as is), while for compatibility with older versions UNICODE symbols can be encoded with control directives. |
|
I propose we should add at least decoding functionality, to convert Unicode control directives to UTF-16 when putting strings to XDE. See http://www.steptools.com/stds/step/IS_final_p21e3.html#clause-6-4-3 for documentation of string encoding in STEP. |
|
IGES format does not support Unicode (and non-ASCII) strings at all, see https://filemonger.com/specs/igs/devdept.com/version6.pdf When exporting to IGES, OCCT translator replaces any non-Ascii character by underscore. This way it is protected, so nothing to be done for IGES. |
|
Branch CR28454 has been created by dpasukhi. SHA-1: c163ff39d23578fc78e962a62129f7b2522749fe Detailed log of new commits: Author: dpasukhi Date: Fri Oct 9 13:57:30 2020 +0300 0028454: Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files Add support of the control directives ( "\X2\" "\X4" "\X\" "\PN\" "\S\"); Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF; Rename "read.stepcaf.codepage" to "read.step.codepage". |
|
Branch CR28454_1 has been created by dpasukhi. SHA-1: 35a9438501e822c0ec642335018362a0cd963d86 Detailed log of new commits: Author: dpasukhi Date: Fri Oct 9 13:57:30 2020 +0300 0028454: Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files Add support of the control directives ( "\X2\" "\X4" "\X\" "\PN\" "\S\"); Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF; Rename "read.stepcaf.codepage" to "read.step.codepage". |
|
Branch CR28454_2 has been created by dpasukhi. SHA-1: 07778bd06b610386ed781ebac7367dee7650d8d8 Detailed log of new commits: Author: dpasukhi Date: Fri Oct 9 13:57:30 2020 +0300 0028454: Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files Add support of the control directives ( "\X2\" "\X4" "\X\" "\PN\" "\S\"); Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF; Rename "read.stepcaf.codepage" to "read.step.codepage". |
|
Branch CR28454_2 has been updated by abv. SHA-1: bcd2801660d88af57351f8115a51cb97a655423e Detailed log of new commits: Author: abv Date: Mon Oct 19 21:33:37 2020 +0300 # Minor corrections: - Resource_CodePages.pxx: specify array size explicitly - Resource_FormatType.hxx: comments - Resource_Unicode.cxx: make code safer against signed chars; map zero ExtChar to '0' - STEPCAFControl_Reader: conversion simplified - StepData_StepReaderData: avoid compiler warnings |
|
Branch CR28454_3 has been created by dpasukhi. SHA-1: e7446d2f913ea386dc20f9acbec3d162e7756f28 Detailed log of new commits: Author: dpasukhi Date: Fri Oct 9 13:57:30 2020 +0300 0028454: Data Exchange, STEP reader - names with special characters cannot be read Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\"); Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF; Rename "read.stepcaf.codepage" to "read.step.codepage". Add ISO 8859-1 - 9 code pages for conversion Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior |
|
Branch CR28454_3 has been updated forcibly by dpasukhi. SHA-1: 45c9532a0825c05bc9b752cac5059c4ebaf4bc88 |
|
Branch CR28454_4 has been created by dpasukhi. SHA-1: 8b2c43b0b02ec8fd1c9e7dcb40d24705367827ab Detailed log of new commits: Author: dpasukhi Date: Fri Oct 9 13:57:30 2020 +0300 0028454: Data Exchange, STEP reader - names with special characters cannot be read Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\"); Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF; Rename "read.stepcaf.codepage" to "read.step.codepage". Add ISO 8859-1 - 9 code pages for conversion Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior Update old test cases that contain control directives |
|
Dear kgv, please review CR28454_4. ALL test are ok, see http://vm-jenkins-test-12.nnov.opencascade.com:8080/view/CR28454-master-dpasukhi/ |
|
+ Interface_Static::Init("step", "read.step.codepage", '&', "eval ANSI"); // Resource_FormatType_ANSI ... + Interface_Static::Init("step", "read.step.codepage", '&', "eval NoConversion"); // Resource_FormatType_NoConversion These two values have the same definition in Resource_FormatType enumeration - how this definition is supposed to work? +void TCollection_ExtendedString::AssignCat(const Standard_Utf16Char other) theChar + Standard_EXPORT void CleanText(const Handle(TCollection_HAsciiString)& val) const; cleanText (const Handle(TCollection_HAsciiString)& theVal) + //! Initialized from "read.stepcaf.codepage" variable by constructor, which is Resource_UTF8 by default. + Resource_FormatType SourceCodePage() const { return mySourceCodePage; } Method description looks outdated - it points to non-existing parameter name. Please update also other places using the old name. |
|
Branch CR28454_4 has been updated by dpasukhi. SHA-1: 19870a79ad44955efcfd46ae1308682e160dc362 Detailed log of new commits: Author: dpasukhi Date: Wed Oct 21 20:41:09 2020 +0300 0028454: Data Exchange, STEP reader - names with special characters cannot be read # Done remarks Fix definition in Resource_FormatType enumeration within STEPControl_Controller Styling variable names Update description contains old "read.stepcaf.codepage" |
|
Branch CR28454_4 has been updated by abv. SHA-1: 131e7a618af65841f16f4d4fb1813fe7dc57e28b Detailed log of new commits: Author: abv Date: Wed Oct 21 22:48:19 2020 +0300 # minor corrections: warning messages are recorded in Interface_Check instead of output to Message_Messenger |
|
Branch CR28454_5 has been created by dpasukhi. SHA-1: a2b0388ac18083c0dbad5d86ea40a2d2aaf1dcb6 Detailed log of new commits: Author: dpasukhi Date: Fri Oct 9 13:57:30 2020 +0300 0028454: Data Exchange, STEP reader - names with special characters cannot be read - Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\"); - Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF; - Rename "read.stepcaf.codepage" to "read.step.codepage". - Add ISO 8859-1 - 9 code pages for conversion - Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior - Update old test cases that contain control directives |
|
Dear kgv, All remarks have been done. Please review CR28454_5. All tests are OK, NO regressions, see http://vm-jenkins-test-12.nnov.opencascade.com:8080/view/CR28454-master-dpasukhi/ |
|
Combination - OCCT branch : IR-2020-10-23 master SHA - 9f9490e1ae0eaf38507437019a117437c6317225 a206de37fbfa0bf71bd534ae47192bbec23b8522 Products branch : IR-2020-10-23 SHA - 4594c3ef5cc6ec5816231ba88e2a4863a25a06d2 was compiled on Linux, MacOS and Windows platforms and tested in optimize mode. Number of compiler warnings: No new/fixed warnings Regressions/Differences/Improvements: No regressions/differences CPU differences: Debian80-64: OCCT Total CPU difference: 18000.12000000008 / 18036.38000000013 [-0.20%] Products Total CPU difference: 12171.670000000115 / 12174.520000000093 [-0.02%] Windows-64-VC14: OCCT Total CPU difference: 19723.78125 / 19746.3125 [-0.11%] Products Total CPU difference: 13538.390625 / 13565.046875 [-0.20%] Image differences : No differences that require special attention Memory differences : No differences that require special attention |
|
Branch CR28454_5 has been deleted by inv. SHA-1: a2b0388ac18083c0dbad5d86ea40a2d2aaf1dcb6 |
|
Branch CR28454_4 has been deleted by inv. SHA-1: 131e7a618af65841f16f4d4fb1813fe7dc57e28b |
|
Branch CR28454_3 has been deleted by inv. SHA-1: 45c9532a0825c05bc9b752cac5059c4ebaf4bc88 |
|
Branch CR28454_2 has been deleted by inv. SHA-1: bcd2801660d88af57351f8115a51cb97a655423e |
|
Branch CR28454_1 has been deleted by inv. SHA-1: 35a9438501e822c0ec642335018362a0cd963d86 |
|
Branch CR28454 has been deleted by inv. SHA-1: c163ff39d23578fc78e962a62129f7b2522749fe |
occt: master 1b9cb073 2020-10-09 10:57:30 Committer: bugmaster Details Diff |
0028454: Data Exchange, STEP reader - names with special characters cannot be read - Add support of the control directives ( "\X2\" "\X4" "\X\" "\P*\" "\S\"); - Make param "read.stepcaf.codepage" base for conversion inside StepData instead of CAF; - Rename "read.stepcaf.codepage" to "read.step.codepage". - Add ISO 8859-1 - 9 code pages for conversion - Add Resource_FormatType_NoConversion format type, that indicates non-conversion behavior - Update old test cases that contain control directives |
Affected Issues 0028454 |
|
mod - src/Resource/FILES | Diff File | ||
rm - src/Resource/Resource_ANSI.pxx | Diff File | ||
add - src/Resource/Resource_CodePages.pxx | Diff File | ||
mod - src/Resource/Resource_FormatType.hxx | Diff File | ||
mod - src/Resource/Resource_Unicode.cxx | Diff File | ||
mod - src/STEPCAFControl/STEPCAFControl_Controller.cxx | Diff File | ||
mod - src/STEPCAFControl/STEPCAFControl_Reader.cxx | Diff File | ||
mod - src/STEPCAFControl/STEPCAFControl_Reader.hxx | Diff File | ||
mod - src/STEPControl/STEPControl_Controller.cxx | Diff File | ||
mod - src/StepData/StepData_StepModel.cxx | Diff File | ||
mod - src/StepData/StepData_StepModel.hxx | Diff File | ||
mod - src/StepData/StepData_StepReaderData.cxx | Diff File | ||
mod - src/StepData/StepData_StepReaderData.hxx | Diff File | ||
mod - src/StepFile/StepFile_Read.cxx | Diff File | ||
mod - src/TCollection/TCollection_ExtendedString.cxx | Diff File | ||
mod - src/TCollection/TCollection_ExtendedString.hxx | Diff File | ||
add - tests/bugs/step/bug28454_1 | Diff File | ||
add - tests/bugs/step/bug28454_2 | Diff File | ||
mod - tests/bugs/step/bug30694 | Diff File | ||
mod - tests/bugs/step/bug31670 | Diff File | ||
mod - tests/bugs/step/bug31670_1 | Diff File | ||
mod - tests/gdt/view/B4 | Diff File | ||
mod - tests/gdt/view/B7 | Diff File |
Date Modified | Username | Field | Change |
---|---|---|---|
2017-02-13 15:21 | BenjaminBihler | New Issue | |
2017-02-13 15:21 | BenjaminBihler | Assigned To | => gka |
2017-02-13 15:23 | BenjaminBihler | File Added: SpecialCharacterNameInside.stp | |
2017-02-13 15:23 | BenjaminBihler | File Added: SpecialCharacterNameInside.igs | |
2017-02-13 17:15 | kgv | Relationship added | child of 0022484 |
2017-04-04 16:35 |
|
Assigned To | gka => imn |
2017-04-04 16:35 |
|
Status | new => assigned |
2017-07-27 11:15 |
|
Target Version | 7.2.0 => 7.4.0 |
2018-01-29 12:30 | kgv | Note Added: 0073643 | |
2018-06-08 14:22 | kgv | Assigned To | imn => gka |
2019-06-24 09:19 | kgv | Summary | Names with Special Characters Cannot Be Read from STEP or IGES Files => Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files |
2019-09-12 14:54 |
|
Target Version | 7.4.0 => 7.5.0 |
2020-07-24 12:55 | kgv | Relationship added | related to 0031670 |
2020-09-10 15:39 |
|
Assigned To | gka => dpasukhi |
2020-09-22 18:19 |
|
Target Version | 7.5.0 => 7.6.0 |
2020-10-06 20:59 |
|
Note Added: 0095764 | |
2020-10-06 20:59 |
|
Target Version | 7.6.0 => 7.5.0 |
2020-10-06 22:08 |
|
Note Added: 0095765 | |
2020-10-13 13:33 | git | Note Added: 0095927 | |
2020-10-14 08:54 |
|
Relationship added | related to 0031851 |
2020-10-14 18:01 | git | Note Added: 0095979 | |
2020-10-19 12:43 | git | Note Added: 0096056 | |
2020-10-19 13:58 | kgv | Summary | Data Exchange - Names with Special Characters Cannot Be Read from STEP or IGES Files => Data Exchange, STEP reader - names with special characters cannot be read |
2020-10-19 21:30 | git | Note Added: 0096064 | |
2020-10-21 10:12 | git | Note Added: 0096089 | |
2020-10-21 11:39 | git | Note Added: 0096096 | |
2020-10-21 16:23 | git | Note Added: 0096113 | |
2020-10-21 18:59 | dpasukhi | Note Added: 0096119 | |
2020-10-21 18:59 | dpasukhi | Assigned To | dpasukhi => kgv |
2020-10-21 18:59 | dpasukhi | Status | assigned => resolved |
2020-10-21 18:59 | dpasukhi | Steps to Reproduce Updated | |
2020-10-21 19:57 | kgv | Note Added: 0096121 | |
2020-10-21 19:58 | kgv | Assigned To | kgv => dpasukhi |
2020-10-21 19:58 | kgv | Status | resolved => assigned |
2020-10-21 20:40 | git | Note Added: 0096122 | |
2020-10-21 22:44 | git | Note Added: 0096124 | |
2020-10-22 10:11 | git | Note Added: 0096127 | |
2020-10-22 14:12 | dpasukhi | Note Added: 0096136 | |
2020-10-22 14:12 | dpasukhi | Assigned To | dpasukhi => kgv |
2020-10-22 14:12 | dpasukhi | Status | assigned => resolved |
2020-10-22 14:25 | kgv | Assigned To | kgv => bugmaster |
2020-10-22 14:25 | kgv | Status | resolved => reviewed |
2020-10-24 12:19 | bugmaster | Note Added: 0096194 | |
2020-10-24 12:19 | bugmaster | Status | reviewed => tested |
2020-10-24 12:25 | bugmaster | Test case number | => bugs/step/bug28454_1,bug28454_2 |
2020-10-24 12:30 | bugmaster | Changeset attached | => occt master 1b9cb073 |
2020-10-24 12:30 | bugmaster | Status | tested => verified |
2020-10-24 12:30 | bugmaster | Resolution | open => fixed |
2020-10-24 12:41 | git | Note Added: 0096208 | |
2020-10-24 12:41 | git | Note Added: 0096209 | |
2020-10-24 12:41 | git | Note Added: 0096213 | |
2020-10-24 12:41 | git | Note Added: 0096215 | |
2020-10-24 12:41 | git | Note Added: 0096221 | |
2020-10-24 12:41 | git | Note Added: 0096227 | |
2020-10-27 11:34 | kgv | Relationship added | parent of 0031884 |
2020-12-02 16:22 |
|
Fixed in Version | => 7.5.0 |
2020-12-02 17:11 |
|
Status | verified => closed |
2021-10-11 12:10 | dpasukhi | Relationship added | parent of 0032310 |