Skip to content
Advertisement

How to get textContent including childNodes?

I have some plain text content in paragraphs inside a <main> HTML element.

the paragraphs are separated by new lines (n), not in <p> tags, and I would like to automatically wrap them in <p> tags using JavaScript.

Example content:

<main>
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
  
  in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit,
  
  sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
  
  eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

  <img src="img/testimg.jpg"> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
  
  consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</main>

Inside of <main> there may be <img> elements.

I want the script to watch for these and leave them untouched. It breaks the HTML (no image is rendered, and dev tools show overflow) if the script tries to wrap them like: <p><img src="img/testimg.jpg"></p> (assuming that is what it is doing).

My script so far:

var maintext = document.getElementsByTagName('main')[0]; // get the first (0th) <main> element

var arr = maintext.textContent.split(/[r?n]+/); // split the text inside into an array by newline
// regex matches one OR MORE consecutive newlines, which prevents empty <p></p> being captured

arr.forEach(function(part, index) {
  if (!this[index].includes("<img ")) {
    this[index] = "<p>" + this[index] + "</p>"; // wrap each paragraph with <p> tags
  }
}, arr);

var rejoined = arr.join("rn"); // join the array to remove commas, with newlines as separators
maintext.innerHTML = rejoined; // replace contents of our <main> element with the new text

I believe the problem may be that <img> is not captured as text along with the textContent of <main>, but instead remains recognized as a child node and it’s messing up the array.

You can see my attempt to only wrap in <p> if that array element does not (!) contain "<img " …however, this is not working. It seems the enclosed HTML elements do not get matched as string data by includes.

What’s the best way to go about this?

Advertisement

Answer

To retain non-text content like images, you’ll need to process the text nodes of the main element rather than using textContent, since that’s just the text content of the element.

Assuming you only want to do this with the text nodes, you can loop through the element’s nodes, split text nodes on line breaks, and if you get more than one segment, insert paragraphs for them. Something like this (see inline comments):

function convertLineBreaksToParagraphs(element) {
    // Get a snapshot of the child nodes of the element; we want
    // a snapshot because we may change the element's contents
    const nodes = [...element.childNodes];
    // Loop through the snapshot
    for (const node of nodes) {
        // Is this a text node?
        if (node.nodeType === Node.TEXT_NODE) {
            // Yes, split it on line breaks
            const parts = node.nodeValue.split(/rn|r|n/);
            // Did we find any?
            if (parts.length > 1) {
                // Yes, loop through the "paragraphs"
                for (const part of parts) {
                    // Create an actual paragraph for it
                    const p = document.createElement("p");
                    p.textContent = part;
                    // Insert in in front of the text node it came from
                    element.insertBefore(p, node)
                }
                // Remove the text node we've replaced with paragraphs
                element.removeChild(node);
            }
        }
    }
}

convertLineBreaksToParagraphs(document.querySelector("main"));
<main>
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
  
  in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit,
  
  sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
  
  eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

  <img src="img/testimg.jpg"> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
  
  consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</main>

You may need to tweak that a bit, depending on how you want to handle that image just before the fifth paragraph. The above leaves the image outside the paragraph. But if you wanted it to be inside the paragraph, you could add some logic to do that.

function convertLineBreaksToParagraphs(element) {
    let img = null;
    // Get a snapshot of the child nodes of the element; we want
    // a snapshot because we may change the element's contents
    const nodes = [...element.childNodes];
    // Loop through the snapshot
    for (const node of nodes) {
        // Is this a text node?
        if (node.nodeType === Node.TEXT_NODE) {
            // Yes, split it on line breaks
            const parts = node.nodeValue.split(/rn|r|n/);
            // Did we find any?
            if (parts.length > 1) {
                // Yes, loop through the "paragraphs"
                for (const part of parts) {
                    // Create an actual paragraph for it
                    const p = document.createElement("p");
                    p.textContent = part;
                    // If we *just* saw an image before this text node,
                    // move it into the paragraph
                    if (img) {
                        p.insertBefore(img, element.firstChild);
                        img = null;
                    }
                    // Insert in in front of the text node it came from
                    element.insertBefore(p, node)
                }
                // Remove the text node we've replaced with paragraphs
                element.removeChild(node);
            }
        } else if (node.nodeName === "IMG") {
            img = node;
        } else {
            img = null;
        }
    }
}

convertLineBreaksToParagraphs(document.querySelector("main"));
<main>
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
  
  in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit,
  
  sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
  
  eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

  <img src="img/testimg.jpg"> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
  
  consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</main>

You may have people telling you to do this by manipulating the HTML from innerHTML, but the problem with doing that is you run the risk of introducing tags in the middle of a tag (and you will remove any event handlers when you set innerHTML on main). For instance, if you have:

<img
   src="/path/to/something">

you’d end up with

<img
<p>   src="/path/to/something"></p>

…which is obviously not good.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement