MantisBT - Open CASCADE
View Issue Details
0031918Open CASCADE[OCCT] OCCT:Application Frameworkpublic2020-11-11 13:182021-07-23 20:58
mpv 
bugmaster 
normalfeature 
verifiedfixed 
[OCCT] 7.5.0 
[OCCT] 7.6.0* 
0031918: Application Framework - New binary format for fast reading part of OCAF document
In the current version of the binary format the shapes, geometry and triangulation are stored in shapes-section and there is no possibility during the reading of such file to skip this part or some sub-parts of it. So, to read part of the document it is necessary to read this section anyway and keep seek-position of objects that must be loaded during labels and attributes processing.

It is proposed to allow to store binary format in a fast-access mode (so, it can not be loaded by the standard binary format reader and older version of OCCT) for reading part of the document only. It will be quite bigger than standard file, but in case it is loaded partially, it should be much faster, without calling "seek" many times.

The ways to speed-up reading are the following:
- To store shapes, geometry, triangulation and other information just in section of TNaming_NamingShape attribute, where it is located in the data tree.
- To refer only shared shape elements, stored before this attribute (to add also flag to store references or always write copy without referencing).
- To write size of the stored section for each label to allow quickly ignore of reading not needed sections.
bugs\caf\bug31839_1
bugs\caf\bug31839_2
bugs\caf\bug31918_1
bugs\caf\bug31918_2
No tags attached.
parent of 0032479verified bugmaster Open CASCADE Application Framework - unnecessary API break within TDocStd_Application::Open() 
Not all the children of this issue are yet resolved or closed.
xls Compare_New_OCAF.xls (15,872) 2021-02-16 12:01
https://tracker.dev.opencascade.org/
Issue History
2020-11-11 13:18mpvNew Issue
2020-11-11 13:18mpvAssigned To => mpv
2020-11-11 13:20mpvRelationship addedchild of 0031839
2021-01-01 12:32gitNote Added: 0098002
2021-01-02 11:26gitNote Added: 0098008
2021-01-02 11:59gitNote Added: 0098009
2021-01-10 12:23gitNote Added: 0098039
2021-01-10 12:53gitNote Added: 0098040
2021-01-11 11:59gitNote Added: 0098049
2021-01-11 14:58gitNote Added: 0098069
2021-02-15 21:01agvFile Added: Compare_New_OCAF.xls
2021-02-15 21:02agvNote Added: 0098883
2021-02-16 12:00agvFile Deleted: Compare_New_OCAF.xls
2021-02-16 12:01agvFile Added: Compare_New_OCAF.xls
2021-04-25 15:55gitNote Added: 0100577
2021-04-26 09:54gitNote Added: 0100583
2021-04-26 10:14gitNote Added: 0100584
2021-04-26 12:48gitNote Added: 0100590
2021-04-29 16:15gitNote Added: 0100668
2021-05-20 11:44gitNote Added: 0101199
2021-06-24 17:26gitNote Added: 0102035
2021-06-25 15:36gitNote Added: 0102050
2021-07-05 09:16gitNote Added: 0102272
2021-07-05 09:52mpvNote Added: 0102273
2021-07-05 09:52mpvAssigned Tompv => vro
2021-07-05 09:52mpvStatusnew => resolved
2021-07-05 09:52mpvSteps to Reproduce Updatedbug_revision_view_page.php?rev_id=25432#r25432
2021-07-05 10:06mpvNote Added: 0102276
2021-07-06 10:13vroNote Added: 0102296
2021-07-06 10:13vroAssigned Tovro => mpv
2021-07-06 10:13vroStatusresolved => assigned
2021-07-06 18:03gitNote Added: 0102309
2021-07-07 08:09mpvNote Added: 0102318
2021-07-07 08:09mpvAssigned Tompv => vro
2021-07-07 08:09mpvStatusassigned => resolved
2021-07-07 08:10mpvNote Edited: 0102318bug_revision_view_page.php?bugnote_id=102318#r25445
2021-07-07 08:20vroNote Added: 0102319
2021-07-07 08:20vroAssigned Tovro => bugmaster
2021-07-07 08:20vroStatusresolved => reviewed
2021-07-09 10:20bugmasterNote Added: 0102386
2021-07-09 10:20bugmasterAssigned Tobugmaster => mpv
2021-07-09 10:20bugmasterStatusreviewed => assigned
2021-07-09 10:50gitNote Added: 0102387
2021-07-09 10:55mpvNote Added: 0102388
2021-07-09 10:55mpvAssigned Tompv => bugmaster
2021-07-09 10:55mpvStatusassigned => resolved
2021-07-10 12:30bugmasterChangeset attached => occt master d5c71e20
2021-07-10 12:30bugmasterStatusresolved => verified
2021-07-10 12:30bugmasterResolutionopen => fixed
2021-07-10 12:40gitNote Added: 0102439
2021-07-10 12:40gitNote Added: 0102440
2021-07-10 12:40gitNote Added: 0102441
2021-07-10 12:40gitNote Added: 0102442
2021-07-14 11:21kgvRelationship addedparent of 0032479
2021-07-23 20:58msvRelationship addedparent of 0032491

Notes
(0098002)
git   
2021-01-01 12:32   
Branch CR31918 has been created by mpv.

SHA-1: 192348a86a29e87ec62ec663ea3688c2c8811f65


Detailed log of new commits:

Author: mpv
Date: Fri Jan 1 12:32:51 2021 +0300

    0031918: Application Framework - New binary format for fast reading part of OCAF document
    
    Initial implementation of new format for quick reading and writing parts of the documents. It consists in writing shapes and all their contents right in the TNaming_NamedShape attribute placement and skipping the shape section.
    
    For the current moment it is implemented as a new version 12 of the binary format. It will be decided later to have it like this and make this version of the format as default, or setting a special flag for such version reading/writing.
    
    Modifications:
    BinLDrivers and BinDrivers packages - modifications related to the quick part tree format flag usage, skipping shape section writing and adding labels sizes into the document to be able to pass labels during the reading quickly.
    BinObjMgt_Persistent amd BinObjMgt_Position - to add possibility to write directly into the stream some data just after the attribute. Before this record a data-size is recorded.
    BinMXCAFDoc package modifications to write BinMXCAFDoc_LocationDriver location in the same way as shapes write location data right after the attribute (empty) data in this new format.
    BinTools package: creation of ShapeReader and ShapeWriter classes with same root class ShapeSetBase with ShapeSet class. These classes allows to write/read shapes directly to the stream. If some object is already in the stream, write a reference - relative position of the duplicated object.
    PCDM_ReaderFilter - modified to be able to browse labels tree quickly, without usage of referencing by entry-strings.
(0098008)
git   
2021-01-02 11:26   
Branch CR31918 has been updated by mpv.

SHA-1: 681f70b515dfd4d996ee1ec061c06a3ba2e2f3a5


Detailed log of new commits:

Author: mpv
Date: Sat Jan 2 11:27:38 2021 +0300

    # fixes for the compilation errors and warnings

(0098009)
git   
2021-01-02 11:59   
Branch CR31918 has been updated by mpv.

SHA-1: 9913d36a8e6b50a88591847e3772d5309d813db6


Detailed log of new commits:

Author: mpv
Date: Sat Jan 2 12:00:27 2021 +0300

    # fixes for the compilation errors and warnings

(0098039)
git   
2021-01-10 12:23   
Branch CR31918_1 has been created by mpv.

SHA-1: 4af86334b3cecda9b6ee2bed0ebe4a78f2a0ba39


Detailed log of new commits:

Author: mpv
Date: Sat Jan 9 18:01:53 2021 +0300

    0031918: Application Framework - New binary format for fast reading part of OCAF document
    
    Initial implementation of new format for quick reading and writing parts of the documents. It consists in writing shapes and all their contents right in the TNaming_NamedShape attribute placement and skipping the shape section.
    
    For the current moment it is implemented as a new version 12 of the binary format. It will be decided later to have it like this and make this version of the format as default, or setting a special flag for such version reading/writing.
    
    Modifications:
    BinLDrivers and BinDrivers packages - modifications related to the quick part tree format flag usage, skipping shape section writing and adding labels sizes into the document to be able to pass labels during the reading quickly.
    BinObjMgt_Persistent amd BinObjMgt_Position - to add possibility to write directly into the stream some data just after the attribute. Before this record a data-size is recorded.
    BinMXCAFDoc package modifications to write BinMXCAFDoc_LocationDriver location in the same way as shapes write location data right after the attribute (empty) data in this new format.
    BinTools package: creation of ShapeReader and ShapeWriter classes with same root class ShapeSetBase with ShapeSet class. These classes allows to write/read shapes directly to the stream. If some object is already in the stream, write a reference - relative position of the duplicated object.
    PCDM_ReaderFilter - modified to be able to browse labels tree quickly, without usage of referencing by entry-strings.
(0098040)
git   
2021-01-10 12:53   
Branch CR31918_1 has been updated by mpv.

SHA-1: c94423fbf0ea082625c683ea02ef617089236afd


Detailed log of new commits:

Author: mpv
Date: Sun Jan 10 12:53:55 2021 +0300

    # fixes for the compilation errors and warnings

(0098049)
git   
2021-01-11 11:59   
Branch CR31918_1 has been updated by mpv.

SHA-1: 58ec12d1a828a1797d971f3c59f1a3cd57a1f2ec


Detailed log of new commits:

Author: mpv
Date: Mon Jan 11 12:00:29 2021 +0300

    # fixes for the compilation errors and warnings

(0098069)
git   
2021-01-11 14:58   
Branch CR31918_1 has been updated by mpv.

SHA-1: 0c40b6e394d5f030fbcedcc6cbcef2fce4dabce8


Detailed log of new commits:

Author: mpv
Date: Mon Jan 11 14:58:55 2021 +0300

    # fixes for the compilation errors and warnings

(0098883)
agv   
2021-02-15 21:02   
Results of testing in ASRV XCAF/XBF converters (cloud version) -- see in the attached file "Compare_New_OCAF.xls"
(0100577)
git   
2021-04-25 15:55   
Branch CR31918_2 has been created by mpv.

SHA-1: 2d0d179d0f06dfd14fe1648a134df047bd56e3e7


Detailed log of new commits:

Author: mpv
Date: Sun Apr 25 15:55:33 2021 +0300

    0031918: Application Framework - New binary format for fast reading part of OCAF document
    
    Implementation of new format for quick reading and writing parts of the documents. It consists in writing shapes and all their contents right in the TNaming_NamedShape attribute placement and skipping the shape section.
    For the current moment it is implemented as a new version 11 of the binary format. It will be decided later to have it like this and make this version of the format as default, or setting a special flag for such version reading/writing.
    
    Modifications:
    BinLDrivers and BinDrivers packages - modifications related to the quick part tree format flag usage, skipping shape section writing and adding labels sizes into the document to be able to pass labels during the reading quickly.
    BinObjMgt_Persistent amd BinObjMgt_Position - to add possibility to write directly into the stream some data just after the attribute. Before this record a data-size is recorded.
    BinMXCAFDoc package modifications to write BinMXCAFDoc_LocationDriver location in the same way as shapes write location data right after the attribute (empty) data in this new format.
    BinTools package: creation of ShapeReader and ShapeWriter classes with same root class ShapeSetBase with ShapeSet class. These classes allows to write/read shapes directly to the stream. If some object is already in the stream, write a reference - relative position of the duplicated object.
    PCDM_ReaderFilter - modified to be able to browse labels tree quickly, without usage of referencing by entry-strings.
(0100583)
git   
2021-04-26 09:54   
Branch CR31918_2 has been updated by mpv.

SHA-1: 8a80a3b410c6553b9bc58febad481a04d78ba273


Detailed log of new commits:

Author: mpv
Date: Mon Apr 26 09:55:12 2021 +0300

    # additional fixes

(0100584)
git   
2021-04-26 10:14   
Branch CR31918_2 has been updated by mpv.

SHA-1: 60299788e7121debe77a3d9c73fcda11ea65e54b


Detailed log of new commits:

Author: mpv
Date: Mon Apr 26 10:15:16 2021 +0300

    # fixes for the compilation errors and warnings

(0100590)
git   
2021-04-26 12:48   
Branch CR31918_2 has been updated by mpv.

SHA-1: 4430644c0e4f35973b01bfef7cb462a70edaa25f


Detailed log of new commits:

Author: mpv
Date: Mon Apr 26 12:48:29 2021 +0300

    # additional fixes

(0100668)
git   
2021-04-29 16:15   
Branch CR31918_2 has been updated by mpv.

SHA-1: 844d46f2bd42285cec2f2d3474771d3c18322ae6


Detailed log of new commits:

Author: mpv
Date: Thu Apr 29 16:16:08 2021 +0300

    # Create a new format version for distinguishing between this format change and the previous one.

(0101199)
git   
2021-05-20 11:44   
Branch CR31918_2 has been updated by mpv.

SHA-1: f2a71473aab68acc2a2ddf894150e0e956c8be74


Detailed log of new commits:

Author: mpv
Date: Thu May 20 11:45:10 2021 +0300

    Optimization of time of writing of the new format. and some minor improvements.

(0102035)
git   
2021-06-24 17:26   
Branch CR31918_3 has been created by mpv.

SHA-1: 21eedb61d93a79d22203ac954de9d3829e39485d


Detailed log of new commits:

Author: mpv
Date: Thu Jun 24 17:26:52 2021 +0300

    0031918: Application Framework - New binary format for fast reading part of OCAF document
    
    Implementation of new format for quick reading and writing parts of the documents (sub-set of labels and sub-set of attributes). It consists in writing shapes and all their contents right in the TNaming_NamedShape attribute placement and skipping the shape section. New format 12 for Binary file types is assigned to this version.
    
    Added PCDM_ReaderFilter class that could be used in Open methods of TDocStd_Application. If it is defined, it allows to read:
    - into already opened document in append mode AppendMode_Protect (do not overwrite existing attributes) or AppendMode_Overwrite
    - only specified sub-trees of the document using AddPath (const TCollection_AsciiString& theEntryToRead)
    - only specified attributes using AddRead (const TCollection_AsciiString& theRead) where theRead could be "TDataStd_Name", for example
    - to skip specified attributes read using AddSkipped (const TCollection_AsciiString& theSkipped) where theSkipped could be "TDF_Reference", for example
    
    The current limitations:
    - only in Bin format
    - if shapes have in the document shared topology, loaded in "append" mode in different "load" operations, they will have no shared topology anymore
    
    Modifications:
    BinLDrivers and BinDrivers packages - modifications related to the quick part tree format flag usage, skipping shape section writing and adding labels sizes into the document to be able to pass labels during the reading quickly.
    BinObjMgt_Persistent and BinObjMgt_Position - to add possibility to write directly into the stream some data just after the attribute. Before this record a data-size is recorded.
    BinMXCAFDoc package modifications to write BinMXCAFDoc_LocationDriver location in the same way as shapes write location data right after the attribute (empty) data in this new format.
    BinTools package: creation of ShapeReader and ShapeWriter classes with same root class ShapeSetBase with ShapeSet class. These classes allows to write/read shapes directly to the stream. If some object is already in the stream, write a reference - relative position of the duplicated object. The old format of documents is still supported by Bin_ToolsShapeSet class.
    PCDM_ReaderFilter - Allows the user to create a reading filter. It contains algorithm to browse labels tree quickly, without usage of referencing by entry-strings.
    TDocStd, CDF and some other packages are changed for supporting reading filters API and options.
    
    Tests, documentation and upgrade information are also added for both issues: 31839 and 31918 related to this commit.
(0102050)
git   
2021-06-25 15:36   
Branch CR31918_3 has been updated by mpv.

SHA-1: ac3f87aa7a7cb4e33666654d26a85ab96cf2debd


Detailed log of new commits:

Author: mpv
Date: Fri Jun 25 15:37:29 2021 +0300

    # minor changes in documentation

(0102272)
git   
2021-07-05 09:16   
Branch CR31918_3 has been updated forcibly by mpv.

SHA-1: 77ec699feddebe5c80ee7d3fd4b75e201de1c2dd
(0102273)
mpv   
2021-07-05 09:52   
OCCT branch 31918_3. In one commit there is is a fix for two issues: 31839 and 31918.
Please, review.
(0102276)
mpv   
2021-07-05 10:06   
http://occt-tests/CR31918_3-master-MPV-OCCT/Windows-64-VC14/summary.html [^]
http://occt-tests/CR31918_3-master-MPV-Products/Windows-64-VC14/summary.html [^]
http://occt-tests/CR31918_3-master-MPV-OCCT/Debian80-64/summary.html [^]
http://occt-tests/CR31918_3-master-MPV-Products/Debian80-64/summary.html [^]
(0102296)
vro   
2021-07-06 10:13   
Dear Mikhail!
Here are few remarks:
1. Coding rules:
  - BinDrivers_DocumentRetrievalDriver.cxx,
  - BinDrivers_DocumentStorageDriver.cxx,
  - BinLDrivers_DocumentRetrievalDriver.cxx (1 line only: Read(...))
2. An extra-word in a comment BinLDroivers_DocumentRetrievalDriver.hxx:
  - "document retrieved document" (1st should be removed, I suppose)
3. The non-regression tests show very impressive results on time of loading of a part of a document. For example:
  - bug31839_1:
    whole document = 4700ms
    without integers = 3500ms
  - bug31918_1:
    whole document = 3000ms
    1/4 of document = 1500ms
It's probably should be put into documentation, what do you think?
An interesting case for loading of a part of a XDE-document without shapes. Just the figures and a short description of the case.
(0102309)
git   
2021-07-06 18:03   
Branch CR31918_3 has been updated by mpv.

SHA-1: c66fd75377bd7d2db8e448ebfe84e58e86ac0644


Detailed log of new commits:

Author: mpv
Date: Tue Jul 6 18:03:30 2021 +0300

    # vro and kgv remarks

(0102318)
mpv   
2021-07-07 08:09   
(edited on: 2021-07-07 08:10)
The coding rules changes are done in a new commit in the same CR31918_3.
Also comment from KGV is applied:

+#define ENDSECTION_POS ":"
...
+ else if (!aCurSection.Name().IsEqual ((Standard_CString)ENDSECTION_POS))

why do we need a "cast" here?

Automatic tests are passed, you can check them using the same links.

About timings of reading of the document: my tests show that it is very dependent on how files are accessible for reading from an operation system point of view, are they fully cashed in memory (so, already were opened), how big the files are, etc. In general numbers are not so optimistic. So, I would not provide exact numbers in the permanent documentation.

However it is planned to prepare some kind of article about this improvement to publish, probably, near to the next OCCT release. There we may provide some numbers, I guess.

(0102319)
vro   
2021-07-07 08:20   
OCCT branch: CR31918_3
Products branch: NOTHING
(0102386)
bugmaster   
2021-07-09 10:20   
Warnings:
http://jenkins-test-11.nnov.opencascade.com/job/warnings_compare/Compare_20Warnings_20Report/ [^]
(0102387)
git   
2021-07-09 10:50   
Branch CR31918_3 has been updated by mpv.

SHA-1: 87af859b7c3f796035f88beb0081bd61fd6a0a7e


Detailed log of new commits:

Author: mpv
Date: Fri Jul 9 10:50:23 2021 +0300

    # elimination of old compiler warnings

(0102388)
mpv   
2021-07-09 10:55   
The old compilers warnings should be eliminated by the last commit.
(0102439)
git   
2021-07-10 12:40   
Branch CR31918 has been deleted by mnt.

SHA-1: 9913d36a8e6b50a88591847e3772d5309d813db6
(0102440)
git   
2021-07-10 12:40   
Branch CR31918_1 has been deleted by mnt.

SHA-1: 0c40b6e394d5f030fbcedcc6cbcef2fce4dabce8
(0102441)
git   
2021-07-10 12:40   
Branch CR31918_2 has been deleted by mnt.

SHA-1: f2a71473aab68acc2a2ddf894150e0e956c8be74
(0102442)
git   
2021-07-10 12:40   
Branch CR31918_3 has been deleted by mnt.

SHA-1: 87af859b7c3f796035f88beb0081bd61fd6a0a7e