View Issue Details

IDProjectCategoryView StatusLast Update
0032193Open CASCADEOCCT:Documentationpublic2023-08-01 15:08
Reporterbtokarev Assigned Tobtokarev  
PrioritynormalSeverityminor 
Status newResolutionopen 
Product Version7.5.0 
Target VersionUnscheduled 
Summary0032193: SEO-related changes in the documentation
DescriptionWhile performing the SEO audit for the OCCT website (due to 7.6.0 release), several SEO-related flaws are reported in the current version of the website (https://new5dev.opencascade.org/) at the documentation sections.

These issues negatively impact online web search engines and should be resolved in order to make the OCCT website appear at higher positions in the web search results.

1. Several pages are doubled

These are located in the /doc/ section of the automatically generated documentation. All the pages should be tagged Canonical and united.

Doubled pages:

https://new5dev.opencascade.org/doc/refman/html/class_graphic3d___vertex.html
https://new5dev.opencascade.org/doc/occt-7.5.0/refman/html/class_graphic3d___vertex.html
https://new5dev.opencascade.org/doc/occt-6.9.1/refman/html/class_graphic3d___vertex.html
https://new5dev.opencascade.org/doc/occt-6.9.0/refman/html/class_graphic3d___vertex.html
https://new5dev.opencascade.org/doc/occt-7.0.0/refman/html/class_graphic3d___vertex.html
https://new5dev.opencascade.org/doc/occt-7.1.0/refman/html/class_graphic3d___vertex.html

Canonical page:

https://new5dev.opencascade.org/doc/refman/html/class_graphic3d___vertex.html


2. Metadata tweaks

Metadata should be reformed for the listed pages following this pattern:

https://new5dev.opencascade.org/doc/overview/html/occt_contribution__tests.html

Title: [Forum Open Cascade Technology]
Description: [user manuals, examples, Open CASCADE Technology].

Examples:

https://new5dev.opencascade.org/doc/overview/html/occt_user_guides__modeling_algos.html
https://new5dev.opencascade.org/doc/overview/html/occt_user_guides__test_harness.html
https://new5dev.opencascade.org/doc/overview/html/specification__boolean_operations.html

3. H1 titles.

3.1. H1 title absence.
H1 title should be added for several /doc/ pages
http://joxi.ru/D2PMnOZcJ8VBPA

The title tag <h1></h1> should be added

Examples:
https://new5dev.opencascade.org/doc/refman/html/class_viewer_test.html
https://new5dev.opencascade.org/doc/refman/html/class_b_rep_test.html
https://new5dev.opencascade.org/doc/refman/html/class_x_s_d_r_a_w.html
https://new5dev.opencascade.org/doc/refman/html/class_draw.html

3.2. Doubled H1 titles

There are many pages on the site where the H1 title is doubled <h1></h1>

Pages list at: - https://docs.google.com/spreadsheets/d/1lo8M-Xp0mQFMYhIe3ik-y1ty7Y4BTBSB55y9AzmFKKU/edit?usp=sharing [Address column]

Only one <h1></h1> title needs to be present (where the page content is described).
TagsNo tags attached.
Test case number

Attached Files

  • scripts.7z (71,616 bytes)

Activities

kgv

2021-03-03 15:22

developer   ~0099335

> These are located in the /doc/ section of the automatically generated documentation. All the pages should be tagged Canonical and united.
I don't know what does "united" means, but "doc/refman" and "doc/occt-7.5.0/refman" are not the same versions (thus, cannot be united).
Probably marking all versioned references could be filtered out from search engine.

btokarev

2021-03-03 15:41

developer   ~0099336

>"doc/refman" and "doc/occt-7.5.0/refman" are not the same versions (thus, cannot be united)
It seems that the content of all listed pages is identical; the initial proposition is to replace all links that are doubled with the link to one page and rename it so it will be clear that this page suits several versions and is "Canonical".
Doubled pages could be removed then.

However, this may not be the best course of action. These issues require further research, especially to verify if the following is possible:
>Probably marking all versioned references could be filtered out from search engine.

I will look into it.

kgv

2021-03-03 15:49

developer   ~0099338

Last edited: 2021-03-03 15:50

> It seems that the content of all listed pages is identical;
These two pages might be identical, but reference manual is generated for thousands of classes each changing independently from version to version.

I can imagine some algorithm detecting if specific page is the same for several OCCT releases (of course, this should be done for all pages and all versions to de-duplicate - like a global hashed map), but can barely say what would be expected output of this knowledge (pages should be still different, as each reference manual shows independent version header - so this information could be helpful only to avoid search engine confusion and potentially for improved content caching, but the latter makes small sense as it is still preferred that each OCCT version documentation will be placed independnetly).

btokarev

2021-03-05 15:38

developer   ~0099435

The new suggested solution for [1. doubled pages] problem is to automatically (via script) tag all doubled pages as "canonical". It appears to be the only possible solution since robot.txt is ignored by Google and Yandex. Does this suit us?

Additionally, the script writing seems to be the optimal solution for [3.2 Doubled H1 titles] problem.

kgv

2021-03-05 15:54

developer   ~0099438

> automatically (via script) tag all doubled pages as "canonical"
Could you please elaborate for those unfamiliar with search robots - what actually script does and where tag is placed?

I suppose that proposed script tags all pages in "doc/refman" as canonical and all others "doc/occt-x.x.x/refman" as non-canonical?

btokarev

2021-04-02 12:38

developer   ~0099985

> Could you please elaborate for those unfamiliar with search robots - what actually script does and where tag is placed?

The bash shell script that edits .html documentation files of previous releases of OCCT (DONE).

It adds <link rel="canonical" href="[link to dev version]"> to every respective page of previous releases; (DONE)
Adds and reforms existing metadata that as suggested; (DONE)
Adds requested <h1> tags (DONE)
Deletes doubled <h1> tags (APPROVAL REQUESTED)

To finalize the SEO remarks implementation (h1 tag deletion), it is necessary to alter header styles for release documentation. This will slightly affect font sizes, however the documentation structure will remain intact.

@kgv Please review the example at http://occt-tests/750/doc/overview/html/index.html
If approval is granted, the documentation will be uploaded for further SEO testing and will be updated once no more remarks would be found.

kgv

2021-04-02 14:40

developer   ~0099993

Last edited: 2021-04-02 14:41

> To finalize the SEO remarks implementation (h1 tag deletion),
> it is necessary to alter header styles for release documentation.
> This will slightly affect font sizes,
> however the documentation structure will remain intact.
I don't quite understand where to look at - there is no patch pushed to git to see the changes.

> Adds and reforms existing metadata that as suggested; (DONE)
Where this "metadata" resides?

> It adds <link rel="canonical" href="[link to dev version]">
> to every respective page of previous releases; (DONE)
I don't see any scripts attached to this bug.
How does script handles pages, which no more exist in new version of documentation or have been moved?
How it is supposed to be used on documentation update?

What about an option asking web searching spiders to exclude old documentation?

btokarev

2021-04-02 15:13

developer   ~0099997

>I don't quite understand where to look at - there is no patch pushed to git to see the changes.
For now we do not implement changes for the build mechanism. The scope of changes for SEO remarks is mostly the static documentation of previous releases (7.5.0 and below). Once all changes are approved by SEO we will tweak the current documentation build system to fit these changes and push tweaks to git.

Example of HTML overview with deleted h1 titles is currently deployed at http://occt-tests/750/doc/overview/html/index.html

Additionally, release 7.5.0 is updated with our changes (except the unapproved h1 part) at https://dev.opencascade.org/doc/occt-7.5.0/overview/html/index.html

>Where this "metadata" resides?
In the HTML header, in between of <head></head> tags. For example, for following page https://dev.opencascade.org/doc/occt-7.5.0/overview/html/build_upgrade__building_occt.html the lines 7-9 are:

<link rel="canonical" href="https://dev.opencascade.org/doc/overview/html/build_upgrade__building_occt.html">
<title>Build OCCT - Open CASCADE Technology Documentation</title>
<meta name="description" content="Build OCCT - documentation, user manuals, examples, Open CASCADE Technology">

>I don't see any scripts attached to this bug.
Attaching them now.

>How does script handles pages, which no more exist in new version of documentation or have been moved?
It goes from the latest version to the earliest in iterations and marks the latest one (mostly it is the "dev" version) as "cannonical".

>How it is supposed to be used on documentation update?
Our assumption that it is to be run once when the OCCT version is changed; new "dev" version will be marked as "canonical" from this point of time and the previous version will be tagged and stored.

>What about an option asking web searching spiders to exclude old documentation?
I asked about this, and this seems not to be an option since users can still search for the older versions of the documentation.

btokarev

2021-04-02 15:13

developer  

scripts.7z (71,616 bytes)

kgv

2021-04-02 15:23

developer   ~0099998

Attached scripts have not a single comment in them.
From the current discussion I suppose that these scripts are not for a single-time usage and might be used from time-to-time.

In this context it is essential putting some description of each script inside the script itself (via comments).

kgv

2021-04-02 15:46

developer   ~0100000

> For now we do not implement changes for the build mechanism.
> The scope of changes for SEO remarks
> is mostly the static documentation of previous releases (7.5.0 and below).
I understand that you are focused on older OCCT versions, but diff for Markdown/Doxygen files would look more self-describable to me to show the nature of proposed changes.
Moreover, it is technically possible re-generating documentation for older releases from source code as alternative to patching pre-generated HTML files.

kgv

2021-04-02 15:48

developer   ~0100001

Last edited: 2021-04-02 15:49

> I asked about this, and this seems not to be an option since
> users can still search for the older versions of the documentation.
How this would work with "canonical" tags?
Will google return search results for an old documentation for "gp_Trsf documentation OCCT 7.0.0" query and for new new documentation for "gp_Trsf documentation OCCT" query?

btokarev

2021-04-02 16:57

developer   ~0100008

Last edited: 2021-04-02 19:22

>I understand that you are focused on older OCCT versions, but diff for Markdown/Doxygen files would look more self-describable to me to show the nature of proposed changes.
In Doxygen the HTML header is edited separately by default and does not have any relations to the .md files; it is unclear to me how it can be edited individually for each piece of documentation.

For <h1> part, however, the editing might be done via .md files. Diff equivalent of these changes is:
@section parts are tagged as <h1>[sectionname]</h1>
@subsection parts are tagged as <h2>[subsectionname]</h2>

So, by approving the changes requested by SEO specialists we are rewriting our documentation so only one <h1> tag stays and every other <h1> present will be demoted to <h2>; <h2> to <h3>, etc.; we will use @subsection instead of @section and @subsubsection instead of @subsection to get this result.

>How this would work with "canonical" tags?
These tags prevent pages with the same titles to be tagged as "overspam" and omitted by search engines.

>Will google return search results for an old documentation for "gp_Trsf documentation OCCT 7.0.0" query and for new new documentation for "gp_Trsf documentation OCCT" query?
Yes, AFAIK this is the general idea.

btokarev

2021-04-02 19:19

developer   ~0100011

Last edited: 2021-04-02 19:20

>Attached scripts have not a single comment in them.
This is a working draft currently aimed to have a sufficient functionality to fit SEO needs; it will be properly polished once we are done with all functions that are required to be in the scripts.

Issue History

Date Modified Username Field Change
2021-03-03 15:10 btokarev New Issue
2021-03-03 15:10 btokarev Assigned To => btokarev
2021-03-03 15:22 kgv Note Added: 0099335
2021-03-03 15:41 btokarev Note Added: 0099336
2021-03-03 15:42 btokarev Description Updated
2021-03-03 15:49 kgv Note Added: 0099338
2021-03-03 15:50 kgv Note Edited: 0099338
2021-03-05 15:38 btokarev Note Added: 0099435
2021-03-05 15:54 kgv Note Added: 0099438
2021-04-02 12:38 btokarev Note Added: 0099985
2021-04-02 14:40 kgv Note Added: 0099993
2021-04-02 14:41 kgv Note Edited: 0099993
2021-04-02 15:13 btokarev Note Added: 0099997
2021-04-02 15:13 btokarev File Added: scripts.7z
2021-04-02 15:23 kgv Note Added: 0099998
2021-04-02 15:46 kgv Note Added: 0100000
2021-04-02 15:48 kgv Note Added: 0100001
2021-04-02 15:49 kgv Note Edited: 0100001
2021-04-02 16:57 btokarev Note Added: 0100008
2021-04-02 19:19 btokarev Note Added: 0100011
2021-04-02 19:20 btokarev Note Edited: 0100011
2021-04-02 19:22 btokarev Note Edited: 0100008
2021-09-09 21:41 kgv Target Version 7.6.0 => 7.7.0
2022-10-24 10:40 szy Target Version 7.7.0 => 7.8.0
2023-08-01 15:08 dpasukhi Target Version 7.8.0 => Unscheduled