View Issue Details

IDProjectCategoryView StatusLast Update
0025685CommunityOCCT:Application Frameworkpublic2023-08-02 02:07
ReporterVico Liang Assigned Todpasukhi  
PrioritylowSeverityminor 
Status assignedResolutionopen 
Product Version5.2.2 
Target VersionUnscheduled 
Summary0025685: Application Framework - TCollection_ExtendedString unicode storage in xml document is unreadable
DescriptionThe unicode storage of TCollection_ExtendedString in xml document starts with "##feff" and it's unreadable. It can be read into memory correctly.
TagsNo tags attached.
Test case number

Relationships

related to 0014673 closedbugmaster Open CASCADE Provide true support for Unicode symbols 
child of 0022484 closedbugmaster Open CASCADE UNICODE characters support. 

Activities

szy

2015-03-02 16:13

manager   ~0038006

We would appreciated to get from you the corresponding xml file presenting the case or script allowing to build this file.
Thanks

Vico Liang

2015-03-02 16:56

developer   ~0038012

       <TDataStd_Name id="14">##feff6d4b8bd54ee378014ee378010031</TDataStd_Name>
       <TDataStd_Name id="17">##feff8fd9662f4ec04e4873a9610f</TDataStd_Name>
       <TDataStd_Name id="20">##feff771f4ed659885783573e76845f88</TDataStd_Name>

szy

2015-03-02 18:15

manager   ~0038018

Dear Vico,
First we need to get answer to the question: is the specified case reproducible?
So, we some procedure (scenario) allowing to do it (i.e. to reproduce your case).
Could you provide it (or at least the kept xml file).
Thanks

Vico Liang

2015-03-03 06:58

developer   ~0038024

Dear szy,

XML storage and retrieve driver won't process unicode in natural. To handler unicode character, OCCT encode unicode by adding prefix "feff". Please see below method for detials:

"feff" is unicode header in method below:

LDOMBasicString::operator TCollection_ExtendedString () const
{
  switch (myType) {
  case LDOM_Integer:
    return TCollection_ExtendedString (myVal.i);
  case LDOM_AsciiFree:
  case LDOM_AsciiDoc:
  case LDOM_AsciiDocClear:
  case LDOM_AsciiHashed:
  {
    char buf[6] = {'\0','\0','\0','\0','\0','\0'};
    const long aUnicodeHeader = 0xfeff;
    Standard_CString ptr = Standard_CString (myVal.ptr);
    errno = 0;
    // Check if ptr is ascii string
    if (ptr[0] != '#' || ptr[1] != '#')
      return TCollection_ExtendedString (ptr);
    buf[0] = ptr[2];
    buf[1] = ptr[3];
    buf[2] = ptr[4];
    buf[3] = ptr[5];
    if (strtol (&buf[0], NULL, 16) != aUnicodeHeader)
      return TCollection_ExtendedString (ptr);

    // convert Unicode to Extended String
    ptr += 2;
    Standard_Size aLength = (strlen(ptr) / 4), j = 0;
    Standard_ExtCharacter * aResult = new Standard_ExtCharacter[aLength--];
    while (aLength--) {
      ptr += 4;
      buf[0] = ptr[0];
      buf[1] = ptr[1];
      buf[2] = ptr[2];
      buf[3] = ptr[3];
      aResult[j++] = Standard_ExtCharacter (strtol (&buf[0], NULL, 16));
      if (errno) {
        delete [] aResult;
        return TCollection_ExtendedString();
      }
    }
    aResult[j] = 0;
    TCollection_ExtendedString aResultStr (aResult);
    delete [] aResult;
    return aResultStr;
  }
  default: ;
  }
  return TCollection_ExtendedString();
}

Vico Liang

2015-03-03 07:01

developer   ~0038025

This issue is not a bug of OCCT but a request to make the encoded unicode readable in xml.

szy

2015-03-03 17:38

manager   ~0038052

So, it is not a bug.
The pointed by you attributes
 <TDataStd_Name id="14">##feff6d4b8bd54ee378014ee378010031</TDataStd_Name>
...
can be correctly read by OCCT Xml persistence drivers.

If you want to propose improvement you can do it via the Development portal (Git repository).

Vico Liang

2015-03-04 11:00

developer   ~0038067

Right, the attribute can be read by OCCT drivers. This is not a bug of OCCT.

I strongly recommend to improve this and make it user readable in xml.

vro

2020-12-14 09:55

developer   ~0097562

Hello Vico! I analyzed the problem and should agree that it would be nice to have a text (or names) in XML file readable, but... I noticed several other cases, which don't admit easy-reading of the text. I mean, the unicode text. XML uses some extra-symbols to write such a text and then read it. An example:

<TDataStd_Name id="2907">S&#xe4;gen auf Fl&#xe4;chen</TDataStd_Name>

Not very user-friendly, do you agree? It is just a German "Saegen auf Flaechen" with umlauts.

As I see, Open CASCADE uses some predefined abbreviation to distinguish some text in TDataStd_Name OCAF attribute. Do you suppose somebody could use a text with such a prefix? Theoretically, somebody could do it... Should we change the notion and use something else? Any ideas are welcome!

Issue History

Date Modified Username Field Change
2015-01-06 10:29 Vico Liang New Issue
2015-01-06 10:29 Vico Liang Assigned To => szy
2015-03-02 16:13 szy Note Added: 0038006
2015-03-02 16:13 szy Assigned To szy => Vico Liang
2015-03-02 16:13 szy Status new => feedback
2015-03-02 16:56 Vico Liang Note Added: 0038012
2015-03-02 16:56 Vico Liang Assigned To Vico Liang => szy
2015-03-02 16:56 Vico Liang Status feedback => assigned
2015-03-02 18:15 szy Note Added: 0038018
2015-03-02 18:15 szy Assigned To szy => Vico Liang
2015-03-02 18:15 szy Status assigned => feedback
2015-03-03 06:58 Vico Liang Note Added: 0038024
2015-03-03 06:58 Vico Liang Assigned To Vico Liang => szy
2015-03-03 06:58 Vico Liang Status feedback => assigned
2015-03-03 07:01 Vico Liang Note Added: 0038025
2015-03-03 17:38 szy Note Added: 0038052
2015-03-03 17:38 szy Assigned To szy => Vico Liang
2015-03-03 17:38 szy Status assigned => feedback
2015-03-04 11:00 Vico Liang Note Added: 0038067
2015-03-04 11:01 Vico Liang Assigned To Vico Liang => szy
2015-03-04 11:01 Vico Liang Status feedback => assigned
2015-03-04 11:31 szy Priority normal => low
2015-03-04 11:31 szy Target Version 6.9.0 => Unscheduled
2016-02-17 18:30 szy Assigned To szy => mpv
2020-10-14 11:45 kgv Relationship added child of 0022484
2020-10-14 11:45 kgv Summary TCollection_ExtendedString unicode storage in xml document is unreadable. => Application Framework - TCollection_ExtendedString unicode storage in xml document is unreadable
2020-10-14 11:49 kgv Product Version => 5.2.2
2020-10-14 11:50 kgv Target Version Unscheduled => 7.6.0
2020-10-14 12:25 abv Relationship added related to 0014673
2020-12-14 09:55 vro Note Added: 0097562
2020-12-14 09:55 vro Assigned To mpv => Vico Liang
2021-08-30 15:33 mpv Status assigned => feedback
2021-08-30 15:33 mpv Target Version 7.6.0 => 7.7.0
2022-10-24 10:39 szy Target Version 7.7.0 => 7.8.0
2023-08-02 02:07 dpasukhi Assigned To Vico Liang => dpasukhi
2023-08-02 02:07 dpasukhi Status feedback => assigned
2023-08-02 02:07 dpasukhi Target Version 7.8.0 => Unscheduled