View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0022125 | Open CASCADE | OCCT:Foundation Classes | public | 2010-11-30 14:33 | 2017-10-01 19:33 |
Reporter | Assigned To | bugmaster | |||
Priority | normal | Severity | trivial | ||
Status | closed | Resolution | fixed | ||
OS | All | ||||
Target Version | 6.8.0 | Fixed in Version | 6.8.0 | ||
Summary | 0022125: TCollection_ExtendedString: conversion from UTF-8 to unicode | ||||
Description | There is a problem in the following constructor of TCollection_ExtendedString class: TCollection_ExtendedString(const Standard_CString astring, const Standard_Boolean isMultiByte); This constructor is used to restore a unicode string from its UTF-8 representation in case when isMultiByte = Standard_True. Internally it invokes ConvertToUnicode3B & ConvertToUnicode2B functions which are intended to construct a single Standard_ExtCharacter instead of 3 or 2 passed chars correspondingly. ConvertToUnicodeXB functions use the following data structure: union { struct { unsigned char h; unsigned char l; } hl; Standard_ExtCharacter chr; } EL; E.g: let's take a symbol 12510 (Japanese hieroglyph character) as an example. It has the following UTF-8 representation (3 bytes): 1110_0011 10_000011 10_011110 which must be restored to 0011000011011110 16-bit value. However, ConvertToUnicode3B will return the following instead: 11011110 00110000 (EL.hl.l and EL.hl.r appear in a wrong order). Issue was reproduced on Win32. This issue was faced during implementation of unified IGES-reading routine accepting utf-8 string as a filename. You can find attached a draft workaround for such a routine (win32-compliant only). This workaround uses MultiByteToWideChar win-function instead. | ||||
Tags | No tags attached. | ||||
Test case number | bugs fclasses bug22125 | ||||
2010-11-30 12:33
|
iges_read_utf8_win.zip (3,258 bytes) |
|
Please provide a test file |
2014-10-14 16:23 developer |
Part1_badname.zip (171,673 bytes) |
|
Please, find attached an IGES file with Japanese name. |
2014-10-14 19:15 reporter |
test_iges_jp.tcl (100 bytes) |
|
Test script added. This problem will be resolved after 0025367 integration |
|
> This problem will be resolved after 0025367 integration the problem in description is irrelevant to 0025367 patch. |
|
Dear bugmaster, please switch the bug to "verified". The issue has been solved within patch for 0022484: inline Standard_ExtCharacter ConvertToUnicode3B (unsigned char *p) { // *p, *(p+1), *(p+2) =>0 , 1, 2 + // little endian union { struct { - unsigned char h; unsigned char l; + unsigned char h; } hl; Available UTF-8/UTF-16 conversion APIs convert the filename "Part1_badname_マヹヱ.igs" from test case in the same way: Utf16 SOURCE: 57 00 3A 00 5C 00 50 00|61 00 72 00 74 00 31 00 5F 00 62 00 61 00 64 00|6E 00 61 00 6D 00 65 00 5F 00 DE 30 F9 30 F1 30|2E 00 69 00 67 00 73 00 Utf16 TCol from Utf8: 57 00 3A 00 5C 00 50 00|61 00 72 00 74 00 31 00 5F 00 62 00 61 00 64 00|6E 00 61 00 6D 00 65 00 5F 00 DE 30 F9 30 F1 30|2E 00 69 00 67 00 73 00 Utf16 NCol from Utf8: 57 00 3A 00 5C 00 50 00|61 00 72 00 74 00 31 00 5F 00 62 00 61 00 64 00|6E 00 61 00 6D 00 65 00 5F 00 DE 30 F9 30 F1 30|2E 00 69 00 67 00 73 00 Utf8 WApi from Utf16: 57 3A 5C 50 61 72 74 31|5F 62 61 64 6E 61 6D 65 5F E3 83 9E E3 83 B9 E3|83 B1 2E 69 67 73 Utf8 NCol from Utf16: 57 3A 5C 50 61 72 74 31|5F 62 61 64 6E 61 6D 65 5F E3 83 9E E3 83 B9 E3|83 B1 2E 69 67 73 Utf8 TCol from Utf16: 57 3A 5C 50 61 72 74 31|5F 62 61 64 6E 61 6D 65 5F E3 83 9E E3 83 B9 E3|83 B1 2E 69 67 73 static TCollection_AsciiString formatHex (const Standard_Byte* theData, const Standard_Size theSize) { TCollection_AsciiString anOut; char aByte[4]; for (size_t aByteId = 0; aByteId < theSize; ++aByteId) { unsigned char aChar = theData[aByteId]; char anEsc = ' '; if ( (aByteId + 1) % 16 == 0 && aByteId != 0) { anEsc = '\n'; } else if ((aByteId + 1) % 8 == 0) { anEsc = '|'; } _snprintf (aByte, 4, "%02X%c", (unsigned int )aChar, anEsc); anOut += aByte; } return anOut; } static Standard_Integer testunicode (Draw_Interpretor& /*theDI*/, Standard_Integer , const char** ) { wchar_t aFilePath [MAX_PATH]; aFilePath [0] = L'\0'; wchar_t aFileTitle[MAX_PATH]; aFileTitle[0] = L'\0'; OPENFILENAMEW anOpenStruct; memset (&anOpenStruct, 0, sizeof(OPENFILENAMEW)); anOpenStruct.lStructSize = sizeof(OPENFILENAMEW); anOpenStruct.nFilterIndex = 1; anOpenStruct.lpstrFile = aFilePath; anOpenStruct.nMaxFile = sizeof(aFilePath); anOpenStruct.lpstrFileTitle = aFileTitle; anOpenStruct.nMaxFileTitle = sizeof(aFileTitle); anOpenStruct.lpstrTitle = L"No Title"; anOpenStruct.Flags = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST; if (!GetOpenFileNameW (&anOpenStruct) || *anOpenStruct.lpstrFile == L'\0') { return 0; } char aBuffU8[4096]; WideCharToMultiByte (CP_UTF8, 0, anOpenStruct.lpstrFile, -1, aBuffU8, 4096, NULL, NULL); NCollection_String anUtf8NCol (anOpenStruct.lpstrFile, -1); char aBuffU8UsingExt[4096]; char* aPtr = aBuffU8UsingExt; TCollection_ExtendedString anExtWide ((Standard_ExtString )anOpenStruct.lpstrFile); anExtWide.ToUTF8CString (aPtr); TCollection_AsciiString aHexUtf16Src = formatHex ((const Standard_Byte* )anOpenStruct.lpstrFile, wcslen (anOpenStruct.lpstrFile) * 2); TCollection_AsciiString aHexUtf8WApi = formatHex ((const Standard_Byte* )aBuffU8, strlen(aBuffU8)); TCollection_AsciiString aHexUtf8NCol = formatHex ((const Standard_Byte* )anUtf8NCol.ToCString(), anUtf8NCol.Size()); TCollection_ExtendedString anExtWideFromUtf8 (aBuffU8, Standard_True); TCollection_AsciiString aHexUtf16ExtFromU8 = formatHex ((const Standard_Byte* )anExtWideFromUtf8.ToExtString(), anExtWideFromUtf8.Length() * 2); TCollection_AsciiString aHexUtf8TColEx = formatHex ((const Standard_Byte* )aBuffU8UsingExt, strlen(aBuffU8UsingExt)); NCollection_UtfWideString anUtf16NColFromUtf8 (aBuffU8, -1); TCollection_AsciiString aHexUtf16NColFromU8 = formatHex ((const Standard_Byte* )anUtf16NColFromUtf8.ToCString(), anUtf16NColFromUtf8.Size()); std::cerr << "Utf16 SOURCE:\n" << aHexUtf16Src << "\n" << "Utf16 TCol from Utf8:\n" << aHexUtf16ExtFromU8 << "\n" << "Utf16 NCol from Utf8:\n" << aHexUtf16NColFromU8 << "\n" << "Utf8 WApi from Utf16:\n" << aHexUtf8WApi << "\n" << "Utf8 NCol from Utf16:\n" << aHexUtf8NCol << "\n" << "Utf8 TCol from Utf16:\n" << aHexUtf8TColEx << "\n"; return 0; } |
|
Mikhail, Please create testing case |
|
Branch CR22125 has been created by apn. SHA-1: 28d7ddb64363611911034b716439922bc0b362cf Detailed log of new commits: Author: apn Date: Fri Oct 31 16:46:53 2014 +0300 0022125: TCollection_ExtendedString: conversion from UTF-8 to unicode Added test case bugs/fclasses/bug22125 |
|
Problem is not reproduced on current state of master on Windows and Debian60-64 in Release and Debug modes. Branch CR22125 was created. It contains test case: bugs fclasses bug22125 - OK |
|
Branch CR22125 has been deleted by kgv. SHA-1: 28d7ddb64363611911034b716439922bc0b362cf |
Date Modified | Username | Field | Change |
---|---|---|---|
2010-11-30 14:39 |
|
CC | => pdn, nkv |
2011-08-02 11:23 | bugmaster | Category | OCCT:FDC => OCCT:Foundation Classes |
2011-12-05 10:45 |
|
Relationship added | child of 0014673 |
2011-12-20 15:02 |
|
Fixed in Version | EMPTY => |
2011-12-20 15:02 |
|
Target Version | => 6.5.3 |
2011-12-20 15:02 |
|
Description Updated | |
2012-02-02 10:15 |
|
Target Version | 6.5.3 => 6.5.4 |
2012-10-21 11:16 |
|
Target Version | 6.5.4 => 6.6.0 |
2013-02-28 17:06 |
|
Target Version | 6.6.0 => 6.7.0 |
2013-11-06 15:10 | kgv | Relationship added | related to 0022484 |
2013-11-06 15:11 | kgv | Target Version | 6.7.0 => 6.7.1 |
2014-04-04 18:32 |
|
Target Version | 6.7.1 => 6.8.0 |
2014-09-11 10:24 |
|
Target Version | 6.8.0 => 7.1.0 |
2014-10-03 14:07 |
|
Note Added: 0032629 | |
2014-10-03 14:07 |
|
Assigned To | bugmaster => ssv |
2014-10-03 14:07 |
|
Status | new => feedback |
2014-10-14 16:23 |
|
File Added: Part1_badname.zip | |
2014-10-14 16:24 |
|
Note Added: 0033071 | |
2014-10-14 16:24 |
|
Assigned To | ssv => pdn |
2014-10-14 16:29 |
|
Status | feedback => assigned |
2014-10-14 19:15 |
|
File Added: test_iges_jp.tcl | |
2014-10-14 19:16 |
|
Note Added: 0033080 | |
2014-10-14 19:17 |
|
Assigned To | pdn => kgv |
2014-10-14 19:17 |
|
Status | assigned => resolved |
2014-10-14 20:05 | kgv | Assigned To | kgv => pdn |
2014-10-14 20:05 | kgv | Status | resolved => assigned |
2014-10-14 20:06 | kgv | Note Added: 0033086 | |
2014-10-16 10:36 | kgv | Note Added: 0033182 | |
2014-10-16 10:36 | kgv | Assigned To | pdn => bugmaster |
2014-10-16 10:36 | kgv | Status | assigned => feedback |
2014-10-16 10:36 | kgv | Resolution | open => fixed |
2014-10-16 10:36 | kgv | Target Version | 7.1.0 => 6.8.0 |
2014-10-17 14:12 | bugmaster | Assigned To | bugmaster => mkv |
2014-10-17 14:12 | bugmaster | Note Added: 0033257 | |
2014-10-20 12:03 | bugmaster | Assigned To | mkv => apn |
2014-10-31 16:47 | git | Note Added: 0033966 | |
2014-10-31 16:47 | apn | Note Added: 0033967 | |
2014-10-31 16:47 | apn | Test case number | => bugs fclasses bug22125 |
2014-10-31 16:47 | apn | Assigned To | apn => bugmaster |
2014-10-31 16:47 | apn | Status | feedback => tested |
2014-11-06 15:18 | bugmaster | Changeset attached | => occt master 5e5ce65b |
2014-11-06 15:18 | bugmaster | Status | tested => verified |
2014-11-11 12:42 |
|
Fixed in Version | => 6.8.0 |
2014-11-11 13:03 |
|
Status | verified => closed |
2014-11-12 08:55 | git | Note Added: 0034243 | |
2017-10-01 19:33 |
|
Relationship added | related to 0029081 |