Strip noscript tags when parsing pages to remove JS-disabled messages

Lemmy and other JS-heavy sites include noscript fallback text like "Javascript is disabled" that pollutes the stored body text and generated snippets/summaries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 14:18:54 -07:00 · 2026-03-27 14:18:54 -07:00 · 570d876b8e
commit 570d876b8e
parent fd20454fa4
1 changed files with 1 additions and 1 deletions
--- a/db.py
+++ b/db.py
@ -328,7 +328,7 @@ def fetch_page(url):
        if og_tag and og_tag.get("content"):
            meta_desc = og_tag["content"].strip()

-    for tag in soup(["script", "style", "nav", "footer", "header"]):
+    for tag in soup(["script", "style", "nav", "footer", "header", "noscript"]):
        tag.decompose()
    title = soup.title.string.strip() if soup.title and soup.title.string else url
    body = soup.get_text(separator=" ", strip=True)