MantisBT - Community
View Issue Details
0031670Community[OCCT] OCCT:Data Exchangepublic2020-07-16 23:262020-12-02 17:13
robertlipman 
bugmaster 
normalminor 
closedfixed 
[OCCT] 7.3.0 
[OCCT] 7.5.0[OCCT] 7.5.0 
Not required
0031670: Data Exchange - cp1251 Cyrillic characters in STEP file
Cyrillic characters in STEP files are not always correctly displayed by CAD Assistant. I cannot figure out why sometimes it works and other times it doesn't. This issue might be similar to other OCC issues regarding text characters in STEP files.
Two STEP files are attached in a zip file. Import them to CAD Assistant. In the Model Browser, block-russian.stp displays the correct characters. russian.stp does not.
No tags attached.
related to 0028454closed bugmaster Community Data Exchange, STEP reader - names with special characters cannot be read 
related to 0031589closed robertlipman Community Data Exchange - unable to read STEP file containing mangled characters 
related to 0014673closed bugmaster Open CASCADE Provide true support for Unicode symbols 
zip russian-step.zip (510,774) 2020-07-16 23:26
https://tracker.dev.opencascade.org/
png cadass_step_locale.png (66,888) 2020-07-24 12:56
https://tracker.dev.opencascade.org/
png cadass_step_win1251.png (78,994) 2020-07-24 12:56
https://tracker.dev.opencascade.org/
Issue History
2020-07-16 23:26robertlipmanNew Issue
2020-07-16 23:26robertlipmanAssigned To => gka
2020-07-16 23:26robertlipmanFile Added: russian-step.zip
2020-07-24 12:38kgvSummaryCyrillic characters in STEP file => Data Exchange - Cyrillic characters in STEP file
2020-07-24 12:40kgvSummaryData Exchange - Cyrillic characters in STEP file => Data Exchange - Windows-1251 Cyrillic characters in STEP file
2020-07-24 12:55kgvRelationship addedrelated to 0028454
2020-07-24 12:56kgvFile Added: cadass_step_locale.png
2020-07-24 12:56kgvFile Added: cadass_step_win1251.png
2020-07-24 13:03kgvNote Added: 0093288
2020-07-24 13:03kgvRelationship addedrelated to 0031589
2020-07-24 13:03kgvSummaryData Exchange - Windows-1251 Cyrillic characters in STEP file => Data Exchange - cp1251 Cyrillic characters in STEP file
2020-08-26 12:12gkaAssigned Togka => dpasukhi
2020-09-30 20:20gitNote Added: 0095504
2020-09-30 20:21gitNote Added: 0095505
2020-09-30 20:22gitNote Added: 0095506
2020-10-02 15:52gitNote Added: 0095582
2020-10-02 15:52gitNote Added: 0095583
2020-10-02 15:53gitNote Added: 0095584
2020-10-02 15:56gitNote Added: 0095586
2020-10-02 15:56gitNote Added: 0095587
2020-10-02 15:57gitNote Added: 0095588
2020-10-03 15:54abvTarget Version => 7.5.0
2020-10-03 18:13gitNote Added: 0095701
2020-10-05 17:07gitNote Added: 0095734
2020-10-05 19:20gitNote Added: 0095738
2020-10-06 12:07gitNote Added: 0095753
2020-10-06 15:41gitNote Added: 0095761
2020-10-06 16:13kgvNote Added: 0095763
2020-10-06 16:13kgvAssigned Todpasukhi => bugmaster
2020-10-06 16:13kgvStatusnew => resolved
2020-10-06 16:13kgvStatusresolved => reviewed
2020-10-06 16:14kgvNote Edited: 0095763bug_revision_view_page.php?bugnote_id=95763#r23705
2020-10-06 22:49abvRelationship addedrelated to 0014673
2020-10-07 16:00bugmasterNote Added: 0095787
2020-10-07 16:00bugmasterStatusreviewed => tested
2020-10-07 16:16bugmasterTest case number => Not required
2020-10-07 16:17bugmasterChangeset attached => occt master baf60a87
2020-10-07 16:17bugmasterStatustested => verified
2020-10-07 16:17bugmasterResolutionopen => fixed
2020-10-08 11:01gitNote Added: 0095793
2020-10-08 11:01gitNote Added: 0095794
2020-10-08 11:01gitNote Added: 0095800
2020-10-08 11:01gitNote Added: 0095805
2020-10-08 11:01gitNote Added: 0095806
2020-10-08 11:01gitNote Added: 0095807
2020-10-25 19:03abvRelationship addedrelated to 0031878
2020-12-02 16:22emoFixed in Version => 7.5.0
2020-12-02 17:13emoStatusverified => closed

Notes
(0093288)
kgv   
2020-07-24 13:03   
> I cannot figure out why sometimes it works and other times it doesn't
This is because block-russian.stp is encoded in UTF-8, while russian.STEP is encoded into cp1251. STEP format does not provide information about used text encoding (it is supposed to be UTF-8) and OCCT STEP reader has no logic automatically detecting encodings nor complete list of conversion tables.

So far, it is only possible specifying "System locale" as alternative to "UTF-8" to STEP translator, but this will work only if STEP file is opened on Windows with exactly in the same locale as where it has been written, and will corrupt any other encoding (legacy way to encode text files before UTF-8 become used everywhere).
(0095504)
git   
2020-09-30 20:20   
Branch CR31670_1 has been created by dpasukhi.

SHA-1: 66b5d844113f1542fc095b4dd7a28ee36cfad48c


Detailed log of new commits:

Author: dpasukhi
Date: Wed Sep 30 15:54:25 2020 +0300

    0031670: Data Exchange - cp1251 Cyrillic characters in STEP file
    
    Add support of ANSI
(0095505)
git   
2020-09-30 20:21   
Branch CR31670_2 has been created by dpasukhi.

SHA-1: 8d1fe9f9d48b578dffe5ec5ab598f53ba8374157


Detailed log of new commits:

Author: dpasukhi
Date: Wed Sep 30 15:54:25 2020 +0300

    0031670: Data Exchange - cp1251 Cyrillic characters in STEP file
    
    Create code pages tables.
(0095506)
git   
2020-09-30 20:22   
Branch CR31670_3 has been created by dpasukhi.

SHA-1: 74de4b25616dcb8e0b4eed26474a2a423a707921


Detailed log of new commits:

Author: dpasukhi
Date: Wed Sep 30 15:54:25 2020 +0300

    0031670: Data Exchange - cp1251 Cyrillic characters in STEP file
    
    Add support of ANSI
(0095582)
git   
2020-10-02 15:52   
Branch CR31670_1 has been deleted by dpasukhi.

SHA-1: 66b5d844113f1542fc095b4dd7a28ee36cfad48c
(0095583)
git   
2020-10-02 15:52   
Branch CR31670_2 has been deleted by dpasukhi.

SHA-1: 8d1fe9f9d48b578dffe5ec5ab598f53ba8374157
(0095584)
git   
2020-10-02 15:53   
Branch CR31670_3 has been deleted by dpasukhi.

SHA-1: 74de4b25616dcb8e0b4eed26474a2a423a707921
(0095586)
git   
2020-10-02 15:56   
Branch CR31670_nocpp has been created by dpasukhi.

SHA-1: 3f64f459be7dcfe1275297c042652147dc3e946a


Detailed log of new commits:

Author: dpasukhi
Date: Wed Sep 30 15:54:25 2020 +0300

    0031670: Data Exchange - cp1251 Cyrillic characters in STEP file
    
    Add support of ANSI
(0095587)
git   
2020-10-02 15:56   
Branch CR31670_cpp11 has been created by dpasukhi.

SHA-1: 12f523fab779672c3187716d251b00a17cc0fc36


Detailed log of new commits:

Author: dpasukhi
Date: Wed Sep 30 15:54:25 2020 +0300

    0031670: Data Exchange - cp1251 Cyrillic characters in STEP file
    
    Add support of ANSI encoding
(0095588)
git   
2020-10-02 15:57   
Branch CR31670_table has been created by dpasukhi.

SHA-1: 7e54bf5c1322b36a3c817c695a0050c88f2c4d63


Detailed log of new commits:

Author: dpasukhi
Date: Wed Sep 30 15:54:25 2020 +0300

    0031670: Data Exchange - cp1251 Cyrillic characters in STEP file
    
    Add support of ANSI
(0095701)
git   
2020-10-03 18:13   
Branch CR31670_1 has been created by dpasukhi.

SHA-1: 9b63b0de37c9649b92cfa139cf0857b514770164


Detailed log of new commits:

Author: dpasukhi
Date: Wed Sep 30 15:54:25 2020 +0300

    0031670: Data Exchange - cp1251 Cyrillic characters in STEP file
    
    Add support for converting pages from Windows encoding to UTF-8
(0095734)
git   
2020-10-05 17:07   
Branch CR31670_2 has been created by dpasukhi.

SHA-1: 295c52b2f6cbf8d2d94667df66199d29e09d30af


Detailed log of new commits:

Author: dpasukhi
Date: Wed Sep 30 15:54:25 2020 +0300

    0031670: Data Exchange - cp1251 Cyrillic characters in STEP file
    
    Add support for converting pages from Windows encoding to Unicode
(0095738)
git   
2020-10-05 19:20   
Branch CR31670_2 has been updated forcibly by dpasukhi.

SHA-1: 79d3c6e37822ba6babc05019d85c0f4c1aa8620c
(0095753)
git   
2020-10-06 12:07   
Branch CR31670_2 has been updated forcibly by dpasukhi.

SHA-1: c63dda117bf55437cc83b846e165a1017d23cb4b
(0095761)
git   
2020-10-06 15:41   
Branch CR31670_3 has been created by dpasukhi.

SHA-1: e52364f53e25d3923e3d2f568534f15960e240d2


Detailed log of new commits:

Author: dpasukhi
Date: Wed Sep 30 15:54:25 2020 +0300

    0031670: Data Exchange - cp1251 Cyrillic characters in STEP file
    
    Add support for converting pages from Windows encoding to Unicode
(0095763)
kgv   
2020-10-06 16:13   
(edited on: 2020-10-06 16:14)
Please raise the patch
- OCCT branch: CR31670_3.

http://jenkins-test-12.nnov.opencascade.com:8080/view/CR31670_2-master-dpasukhi [^]

(0095787)
bugmaster   
2020-10-07 16:00   
Combination -
OCCT branch : OCCT-750-BETA
master SHA - 278da162dc52c26c8cfe9d002a6f07db12405194
a206de37fbfa0bf71bd534ae47192bbec23b8522
Products branch : OCCT-750-BETA SHA - d9c364e1137eed3249e5a05befa860c708f243c0
was compiled on Linux, MacOS and Windows platforms and tested in optimize mode.

Number of compiler warnings:
No new/fixed warnings

Regressions/Differences/Improvements:
No regressions/differences

CPU differences:
Debian80-64:
OCCT
Total CPU difference: 18038.89000000012 / 18085.73000000008 [-0.26%]
Products
Total CPU difference: 12182.170000000115 / 12217.490000000116 [-0.29%]
Windows-64-VC14:
OCCT
Total CPU difference: 19726.21875 / 19713.9375 [+0.06%]
Products
Total CPU difference: 13586.625 / 13579.390625 [+0.05%]


Image differences :
No differences that require special attention

Memory differences :
No differences that require special attention
(0095793)
git   
2020-10-08 11:01   
Branch CR31670_3 has been deleted by inv.

SHA-1: e52364f53e25d3923e3d2f568534f15960e240d2
(0095794)
git   
2020-10-08 11:01   
Branch CR31670_2 has been deleted by inv.

SHA-1: c63dda117bf55437cc83b846e165a1017d23cb4b
(0095800)
git   
2020-10-08 11:01   
Branch CR31670_1 has been deleted by inv.

SHA-1: 9b63b0de37c9649b92cfa139cf0857b514770164
(0095805)
git   
2020-10-08 11:01   
Branch CR31670_nocpp has been deleted by inv.

SHA-1: 3f64f459be7dcfe1275297c042652147dc3e946a
(0095806)
git   
2020-10-08 11:01   
Branch CR31670_cpp11 has been deleted by inv.

SHA-1: 12f523fab779672c3187716d251b00a17cc0fc36
(0095807)
git   
2020-10-08 11:01   
Branch CR31670_table has been deleted by inv.

SHA-1: 7e54bf5c1322b36a3c817c695a0050c88f2c4d63