Clean title come from meta tags by cikay · Pull Request #824 · adbar/trafilatura

cikay · 2026-02-13T18:11:35Z

Apply HTMLTITLE_REGEX cleanup to titles extracted from og:title, twitter:title, meta name title, and itemprop headline. Previously, extract() with with_metadata=True returned titles with site name suffixes from meta tags, while extract_title() correctly returned clean titles from h1 tags.

Add clean_title() helper function to remove site name suffix/prefix
Apply clean_title() in extract_opengraph() for og:title
Apply clean_title() in examine_meta() for meta name titles and itemprop headlines
Add tests for clean_title() and title cleaning in metadata extraction

The issue is encountered in the following website
https://www.nuhev.com

example url: https://www.nuhev.com/gelo-jiyan-de-li-sala-2050yan-cawa-be/

titles from meta: <meta property="og:title" content="Gelo Jîyan dê li Sala 2050yan Çawa Be? - Nûhev Co. %100 Kurdî">
titles from h1: <h1 class="jeg_post_title">Gelo Jîyan dê li Sala 2050yan Çawa Be?</h1>

Clean title come from meta tags

307f5ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clean title come from meta tags#824

Clean title come from meta tags#824
cikay wants to merge 1 commit intoadbar:masterfrom
cikay:title-extraction

cikay commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cikay commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cikay commented Feb 13, 2026 •

edited

Loading