View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0032193 | Open CASCADE | OCCT:Documentation | public | 2021-03-03 15:10 | 2023-08-01 15:08 |
Reporter | btokarev | Assigned To | btokarev | ||
Priority | normal | Severity | minor | ||
Status | new | Resolution | open | ||
Product Version | 7.5.0 | ||||
Target Version | Unscheduled | ||||
Summary | 0032193: SEO-related changes in the documentation | ||||
Description | While performing the SEO audit for the OCCT website (due to 7.6.0 release), several SEO-related flaws are reported in the current version of the website (https://new5dev.opencascade.org/) at the documentation sections. These issues negatively impact online web search engines and should be resolved in order to make the OCCT website appear at higher positions in the web search results. 1. Several pages are doubled These are located in the /doc/ section of the automatically generated documentation. All the pages should be tagged Canonical and united. Doubled pages: https://new5dev.opencascade.org/doc/refman/html/class_graphic3d___vertex.html https://new5dev.opencascade.org/doc/occt-7.5.0/refman/html/class_graphic3d___vertex.html https://new5dev.opencascade.org/doc/occt-6.9.1/refman/html/class_graphic3d___vertex.html https://new5dev.opencascade.org/doc/occt-6.9.0/refman/html/class_graphic3d___vertex.html https://new5dev.opencascade.org/doc/occt-7.0.0/refman/html/class_graphic3d___vertex.html https://new5dev.opencascade.org/doc/occt-7.1.0/refman/html/class_graphic3d___vertex.html Canonical page: https://new5dev.opencascade.org/doc/refman/html/class_graphic3d___vertex.html 2. Metadata tweaks Metadata should be reformed for the listed pages following this pattern: https://new5dev.opencascade.org/doc/overview/html/occt_contribution__tests.html Title: [Forum Open Cascade Technology] Description: [user manuals, examples, Open CASCADE Technology]. Examples: https://new5dev.opencascade.org/doc/overview/html/occt_user_guides__modeling_algos.html https://new5dev.opencascade.org/doc/overview/html/occt_user_guides__test_harness.html https://new5dev.opencascade.org/doc/overview/html/specification__boolean_operations.html 3. H1 titles. 3.1. H1 title absence. H1 title should be added for several /doc/ pages http://joxi.ru/D2PMnOZcJ8VBPA The title tag <h1></h1> should be added Examples: https://new5dev.opencascade.org/doc/refman/html/class_viewer_test.html https://new5dev.opencascade.org/doc/refman/html/class_b_rep_test.html https://new5dev.opencascade.org/doc/refman/html/class_x_s_d_r_a_w.html https://new5dev.opencascade.org/doc/refman/html/class_draw.html 3.2. Doubled H1 titles There are many pages on the site where the H1 title is doubled <h1></h1> Pages list at: - https://docs.google.com/spreadsheets/d/1lo8M-Xp0mQFMYhIe3ik-y1ty7Y4BTBSB55y9AzmFKKU/edit?usp=sharing [Address column] Only one <h1></h1> title needs to be present (where the page content is described). | ||||
Tags | No tags attached. | ||||
Test case number | |||||
|
> These are located in the /doc/ section of the automatically generated documentation. All the pages should be tagged Canonical and united. I don't know what does "united" means, but "doc/refman" and "doc/occt-7.5.0/refman" are not the same versions (thus, cannot be united). Probably marking all versioned references could be filtered out from search engine. |
|
>"doc/refman" and "doc/occt-7.5.0/refman" are not the same versions (thus, cannot be united) It seems that the content of all listed pages is identical; the initial proposition is to replace all links that are doubled with the link to one page and rename it so it will be clear that this page suits several versions and is "Canonical". Doubled pages could be removed then. However, this may not be the best course of action. These issues require further research, especially to verify if the following is possible: >Probably marking all versioned references could be filtered out from search engine. I will look into it. |
|
> It seems that the content of all listed pages is identical; These two pages might be identical, but reference manual is generated for thousands of classes each changing independently from version to version. I can imagine some algorithm detecting if specific page is the same for several OCCT releases (of course, this should be done for all pages and all versions to de-duplicate - like a global hashed map), but can barely say what would be expected output of this knowledge (pages should be still different, as each reference manual shows independent version header - so this information could be helpful only to avoid search engine confusion and potentially for improved content caching, but the latter makes small sense as it is still preferred that each OCCT version documentation will be placed independnetly). |
|
The new suggested solution for [1. doubled pages] problem is to automatically (via script) tag all doubled pages as "canonical". It appears to be the only possible solution since robot.txt is ignored by Google and Yandex. Does this suit us? Additionally, the script writing seems to be the optimal solution for [3.2 Doubled H1 titles] problem. |
|
> automatically (via script) tag all doubled pages as "canonical" Could you please elaborate for those unfamiliar with search robots - what actually script does and where tag is placed? I suppose that proposed script tags all pages in "doc/refman" as canonical and all others "doc/occt-x.x.x/refman" as non-canonical? |
|
> Could you please elaborate for those unfamiliar with search robots - what actually script does and where tag is placed? The bash shell script that edits .html documentation files of previous releases of OCCT (DONE). It adds <link rel="canonical" href="[link to dev version]"> to every respective page of previous releases; (DONE) Adds and reforms existing metadata that as suggested; (DONE) Adds requested <h1> tags (DONE) Deletes doubled <h1> tags (APPROVAL REQUESTED) To finalize the SEO remarks implementation (h1 tag deletion), it is necessary to alter header styles for release documentation. This will slightly affect font sizes, however the documentation structure will remain intact. @kgv Please review the example at http://occt-tests/750/doc/overview/html/index.html If approval is granted, the documentation will be uploaded for further SEO testing and will be updated once no more remarks would be found. |
|
> To finalize the SEO remarks implementation (h1 tag deletion), > it is necessary to alter header styles for release documentation. > This will slightly affect font sizes, > however the documentation structure will remain intact. I don't quite understand where to look at - there is no patch pushed to git to see the changes. > Adds and reforms existing metadata that as suggested; (DONE) Where this "metadata" resides? > It adds <link rel="canonical" href="[link to dev version]"> > to every respective page of previous releases; (DONE) I don't see any scripts attached to this bug. How does script handles pages, which no more exist in new version of documentation or have been moved? How it is supposed to be used on documentation update? What about an option asking web searching spiders to exclude old documentation? |
|
>I don't quite understand where to look at - there is no patch pushed to git to see the changes. For now we do not implement changes for the build mechanism. The scope of changes for SEO remarks is mostly the static documentation of previous releases (7.5.0 and below). Once all changes are approved by SEO we will tweak the current documentation build system to fit these changes and push tweaks to git. Example of HTML overview with deleted h1 titles is currently deployed at http://occt-tests/750/doc/overview/html/index.html Additionally, release 7.5.0 is updated with our changes (except the unapproved h1 part) at https://dev.opencascade.org/doc/occt-7.5.0/overview/html/index.html >Where this "metadata" resides? In the HTML header, in between of <head></head> tags. For example, for following page https://dev.opencascade.org/doc/occt-7.5.0/overview/html/build_upgrade__building_occt.html the lines 7-9 are: <link rel="canonical" href="https://dev.opencascade.org/doc/overview/html/build_upgrade__building_occt.html"> <title>Build OCCT - Open CASCADE Technology Documentation</title> <meta name="description" content="Build OCCT - documentation, user manuals, examples, Open CASCADE Technology"> >I don't see any scripts attached to this bug. Attaching them now. >How does script handles pages, which no more exist in new version of documentation or have been moved? It goes from the latest version to the earliest in iterations and marks the latest one (mostly it is the "dev" version) as "cannonical". >How it is supposed to be used on documentation update? Our assumption that it is to be run once when the OCCT version is changed; new "dev" version will be marked as "canonical" from this point of time and the previous version will be tagged and stored. >What about an option asking web searching spiders to exclude old documentation? I asked about this, and this seems not to be an option since users can still search for the older versions of the documentation. |
|
scripts.7z (71,616 bytes) |
|
Attached scripts have not a single comment in them. From the current discussion I suppose that these scripts are not for a single-time usage and might be used from time-to-time. In this context it is essential putting some description of each script inside the script itself (via comments). |
|
> For now we do not implement changes for the build mechanism. > The scope of changes for SEO remarks > is mostly the static documentation of previous releases (7.5.0 and below). I understand that you are focused on older OCCT versions, but diff for Markdown/Doxygen files would look more self-describable to me to show the nature of proposed changes. Moreover, it is technically possible re-generating documentation for older releases from source code as alternative to patching pre-generated HTML files. |
|
> I asked about this, and this seems not to be an option since > users can still search for the older versions of the documentation. How this would work with "canonical" tags? Will google return search results for an old documentation for "gp_Trsf documentation OCCT 7.0.0" query and for new new documentation for "gp_Trsf documentation OCCT" query? |
|
>I understand that you are focused on older OCCT versions, but diff for Markdown/Doxygen files would look more self-describable to me to show the nature of proposed changes. In Doxygen the HTML header is edited separately by default and does not have any relations to the .md files; it is unclear to me how it can be edited individually for each piece of documentation. For <h1> part, however, the editing might be done via .md files. Diff equivalent of these changes is: @section parts are tagged as <h1>[sectionname]</h1> @subsection parts are tagged as <h2>[subsectionname]</h2> So, by approving the changes requested by SEO specialists we are rewriting our documentation so only one <h1> tag stays and every other <h1> present will be demoted to <h2>; <h2> to <h3>, etc.; we will use @subsection instead of @section and @subsubsection instead of @subsection to get this result. >How this would work with "canonical" tags? These tags prevent pages with the same titles to be tagged as "overspam" and omitted by search engines. >Will google return search results for an old documentation for "gp_Trsf documentation OCCT 7.0.0" query and for new new documentation for "gp_Trsf documentation OCCT" query? Yes, AFAIK this is the general idea. |
|
>Attached scripts have not a single comment in them. This is a working draft currently aimed to have a sufficient functionality to fit SEO needs; it will be properly polished once we are done with all functions that are required to be in the scripts. |
Date Modified | Username | Field | Change |
---|---|---|---|
2021-03-03 15:10 | btokarev | New Issue | |
2021-03-03 15:10 | btokarev | Assigned To | => btokarev |
2021-03-03 15:22 | kgv | Note Added: 0099335 | |
2021-03-03 15:41 | btokarev | Note Added: 0099336 | |
2021-03-03 15:42 | btokarev | Description Updated | |
2021-03-03 15:49 | kgv | Note Added: 0099338 | |
2021-03-03 15:50 | kgv | Note Edited: 0099338 | |
2021-03-05 15:38 | btokarev | Note Added: 0099435 | |
2021-03-05 15:54 | kgv | Note Added: 0099438 | |
2021-04-02 12:38 | btokarev | Note Added: 0099985 | |
2021-04-02 14:40 | kgv | Note Added: 0099993 | |
2021-04-02 14:41 | kgv | Note Edited: 0099993 | |
2021-04-02 15:13 | btokarev | Note Added: 0099997 | |
2021-04-02 15:13 | btokarev | File Added: scripts.7z | |
2021-04-02 15:23 | kgv | Note Added: 0099998 | |
2021-04-02 15:46 | kgv | Note Added: 0100000 | |
2021-04-02 15:48 | kgv | Note Added: 0100001 | |
2021-04-02 15:49 | kgv | Note Edited: 0100001 | |
2021-04-02 16:57 | btokarev | Note Added: 0100008 | |
2021-04-02 19:19 | btokarev | Note Added: 0100011 | |
2021-04-02 19:20 | btokarev | Note Edited: 0100011 | |
2021-04-02 19:22 | btokarev | Note Edited: 0100008 | |
2021-09-09 21:41 | kgv | Target Version | 7.6.0 => 7.7.0 |
2022-10-24 10:40 |
|
Target Version | 7.7.0 => 7.8.0 |
2023-08-01 15:08 | dpasukhi | Target Version | 7.8.0 => Unscheduled |