Skip to content
Advertisement

Finding position of dom node in the document source

Context

I’m building a set of ‘extractor’ functions whose purpose is to extract what looks like components from a page (using jsdom and nodejs). The final result should be these ‘component’ objects ordered by where they originally appeared in the page.

Problem

The last part of this process is a bit problematic. As far as I can see, there’s no easy way to tell where a given element is in a given dom document’s source code.

The numeric depth or css/xpath-like path also doesn’t feel helpful in this case.

Example

With the given extractors…

JavaScript

…and the given document (I know, it’s an ugly and un-semantic example..):

JavaScript

I need something like:

JavaScript

(which can be later ordered by item.position)

For example, 45 is the position/offset of the <button with the example html string.

Advertisement

Answer

You could just iterate all the elements in the DOM and assign them an index, given your DOM doesn’t change:

JavaScript

Then your extractor can just use that:

JavaScript

Alternatively, JSDOM provides a feature where it attaches the source position in the parsed HTML text to every node, you can also use that – see includeNodeLocations. The startOffset will be in document order as well. So if you parse the input with that option enabled, you can use

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement