doxysphinx.html_parser#

The html_parser module contains the html parser that will load and process the html files.

To allow several writer implementations to pick up and handle the result of that parsing a html parser in a neutral way the parser will change all relevant rst/sphinx markup elements to <snippet>-elements.

Module Contents#

Classes#

HtmlParseResult

Capsules a parsed and processed html tree with meta information.

HtmlParser

Html Parser Protocol for parsing html files into a neutral format (that can be then processed further).

ElementProcessor

An ElementProcessor processes specific html elements, one at a time.

RstInlineProcessor

Element Processor for inline rst elements.

RstBlockProcessor

Element Processor for rst block elements.

PreToDivProcessor

This Element Processor will change <pre>-tags to <div class="fragments"> tags.

MarkdownRstBlockProcessor

Element Processor for doxygen markdown block elements.

DoxygenHtmlParser

Parser for Doxygen HTML output files.

class doxysphinx.html_parser.HtmlParseResult[source]#

Capsules a parsed and processed html tree with meta information.

html_input_file: pathlib.Path#

The html file that was parsed.

project: str#

The project where this html file belongs to. This can be e.g. a directory name or a component/module name etc.

meta_title: str#

The html meta title if present in the original html. If not just set to document title

document_title: str#

The document title. This is the title that is visible e.g. in sphinx menu structure.

used_snippet_formats: Set[str] | None#

The list of snippet formats that are used inside the html tree if any.

tree: lxml.etree._ElementTree | None#

The html/xml element tree or None if nothing was parsed because the html shouldn’t be handled as mixed mode content.

class doxysphinx.html_parser.HtmlParser(source_directory: pathlib.Path)[source]#

Bases: Protocol

Inheritance diagram of doxysphinx.html_parser.HtmlParser

Html Parser Protocol for parsing html files into a neutral format (that can be then processed further).

You own html parser should find/generate all rst-content in <rst>-tags. The further tooling can then work with that.

abstract parse(file: pathlib.Path) HtmlParseResult[source]#

Parse a html file.

This method returns a ParseResult (Tuple[bool, _ElementTree]). The first item in the tuple indicates if rst data was found during parsing. The second item is the parsed and normalized html as ElementTree. It is expected that all rst data in this resulting ElementTree is present in special <rst>-tags.

Parameters:

file – The html file to parse

Returns:

The result of the parsing

class doxysphinx.html_parser.ElementProcessor[source]#

Bases: Protocol

Inheritance diagram of doxysphinx.html_parser.ElementProcessor

An ElementProcessor processes specific html elements, one at a time.

Typically this is used to either clean up or transform the elements into a neutralized format.

elements: List[str] = []#

A list of html element names this processor can process.

This is for pre-filtering html elements (an optimization). This processors try_process method is only called on these elements.

is_final: bool = True#

Whether other processors should be called after this one.

With a “final processor” (is_final == True) processing of an element stops (no other processors considered) once the try_process method returns True.

format: str = 'None'#

The format this element processor processes… like ‘rst’, ‘md’ etc.

try_process(element: lxml.etree._Element) bool[source]#

Try to process an element.

Parameters:

element – The element to check and process

Returns:

Whether the “processor did it’s thing”/”processing was applied” (True) or not (False)

class doxysphinx.html_parser.RstInlineProcessor[source]#

Element Processor for inline rst elements.

elements = ['code']#
format = 'rst'#
is_final = True#
rst_role_regex#
try_process(element: lxml.etree._Element) bool[source]#

Try to process an rst inline element into a neutralized format.

Parameters:

element – The html element to process

Returns:

True if the element was processed else False

class doxysphinx.html_parser.RstBlockProcessor[source]#

Element Processor for rst block elements.

elements = ['code', 'pre']#
format = 'rst'#
is_final = True#
try_process(element: lxml.etree._Element) bool[source]#

Try to process an rst block element into a neutralized format.

Parameters:

element – The html element to process

Returns:

True if the element was processed else False

class doxysphinx.html_parser.PreToDivProcessor[source]#

This Element Processor will change <pre>-tags to <div class=”fragments”> tags.

We do this because doxysphinx will linearize html output in the writer to have it in one line in the raw html directive. However this will destroy the newlines in pre tags. To overcome that We change the pre output here to a div with inner line divs (which is also supported by doxygen).

This processor is special because it should only run when any other processor has done something.

elements = ['pre']#
format = ''#
is_final = True#
try_process(element: lxml.etree._Element) bool[source]#

Transform a pre element into a div element.

Parameters:

element – The html element to process

Returns:

True if the element was processed else False

class doxysphinx.html_parser.MarkdownRstBlockProcessor[source]#

Element Processor for doxygen markdown block elements.

This processor will check if the first line in the markdown block is either a supported marker or a directive (auto detection feature).

Markdown block elements in doxygen are getting rendered different to verbatim content. Each Markdown block (delimited with 3 backticks) will be something like this in html:

<div class="fragment">
  <div class="line">{rst}</div>
  <div class="line">This is rst content</div>
  <div class="line"> </div>
  <div class="line">anything can be used here...</div>
  <div class="line"> </div>
  <div class="line">like an admonition:</div>
  <div class="line"> </div>
  <div class="line">..admonition::</div>
  <div class="line">  </div>
  <div class="line">  test</div>
</div>
elements = ['div']#
format = 'rst'#
is_final = True#
try_process(element: lxml.etree._Element) bool[source]#

Try to process an rst block element into a neutralized format.

Parameters:

element – The html element to process

Returns:

True if the element was processed else False

class doxysphinx.html_parser.DoxygenHtmlParser(source_directory: pathlib.Path)[source]#

Parser for Doxygen HTML output files.

parse(file: pathlib.Path) HtmlParseResult[source]#

Parse a doxygen HTML file into an ElementTree and normalize its inner data to contain <rst>-tags.

Parameters:

file (Path) – The html file to parse

Returns:

The result of the parsing

Return type:

ParseResult