Add Infection + PHPStan CI check on PHP 8.3 and fix PHP 7.x template serialization#145
Add Infection + PHPStan CI check on PHP 8.3 and fix PHP 7.x template serialization#145Copilot wants to merge 13 commits into
Conversation
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/857899c7-72b3-490d-ba54-0657260e3982 Co-authored-by: voku <264695+voku@users.noreply.github.com>
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/857899c7-72b3-490d-ba54-0657260e3982 Co-authored-by: voku <264695+voku@users.noreply.github.com>
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/857899c7-72b3-490d-ba54-0657260e3982 Co-authored-by: voku <264695+voku@users.noreply.github.com>
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/857899c7-72b3-490d-ba54-0657260e3982 Co-authored-by: voku <264695+voku@users.noreply.github.com>
|
@copilot fix at least this : 39s .................................F............................. 63 / 410 ( 15%) Time: 00:38.946, Memory: 30.00 MB There were 7 failures:
/home/runner/work/simple_html_dom/simple_html_dom/tests/CoverageBoostTest.php:487
/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlDomParserTest.php:1370
/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlDomParserTest.php:1453
/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlDomParserTest.php:1867
/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlDomParserTest.php:1921
/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlDomParserTest.php:1940
/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlSerializationRegressionTest.php:72 FAILURES! Generating code coverage report in Clover XML format ... done [00:00.027] |
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/0f295050-05df-4c8b-a858-3d14833cbeda Co-authored-by: voku <264695+voku@users.noreply.github.com>
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/0f295050-05df-4c8b-a858-3d14833cbeda Co-authored-by: voku <264695+voku@users.noreply.github.com>
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/0f295050-05df-4c8b-a858-3d14833cbeda Co-authored-by: voku <264695+voku@users.noreply.github.com>
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/0f295050-05df-4c8b-a858-3d14833cbeda Co-authored-by: voku <264695+voku@users.noreply.github.com>
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/0f295050-05df-4c8b-a858-3d14833cbeda Co-authored-by: voku <264695+voku@users.noreply.github.com>
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/0f295050-05df-4c8b-a858-3d14833cbeda Co-authored-by: voku <264695+voku@users.noreply.github.com>
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/0f295050-05df-4c8b-a858-3d14833cbeda Co-authored-by: voku <264695+voku@users.noreply.github.com>
Fixed in efbfd5e. The PHP < 8 detached-element serialization helper now strips the synthetic trailing newline that was reintroducing the CI failures, and I added regression coverage for the helper behavior. Local targeted regressions and the full PHPUnit suite are green. No UI changes. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a configuration for the Infection mutation testing tool and refactors HTML node serialization to improve compatibility with PHP versions older than 8.0. The refactoring includes a new method, serializeElementNodeForPhpLt8, which handles element serialization by stripping unnecessary wrapper tags and trailing newlines. Feedback suggests reordering the removal of tags in this new method to ensure that empty body fragments are correctly removed before individual tags are stripped, preventing potential dangling tags.
| $content = (string) \preg_replace('/^<body[^>]*>/i', '', $content); | ||
| $content = (string) \preg_replace('/<\/body>\s*$/i', '', $content); | ||
| $content = \str_replace('<body></body>', '', $content); |
There was a problem hiding this comment.
The str_replace for the empty <body> tag should be performed before the regex-based stripping of <body> and </body> tags.
When serializing a <head> node on PHP < 8, libxml often appends an empty <body></body> at the end of the document. If preg_replace('/<\/body>\s*$/i', '', $content) runs first, it will strip the closing </body> tag, leaving a dangling <body> tag that str_replace('<body></body>', ...) will then fail to match and remove. Moving the literal replacement to the top of this block ensures it correctly catches the empty body fragment before the individual tags are stripped.
$content = \str_replace('<body></body>', '', $content);
$content = (string) \preg_replace('/^<body[^>]*>/i', '', $content);
$content = (string) \preg_replace('/<\/body>\s*$/i', '', $content);|
@copilot Check if this feedback from another llm is valid or just AI slop: Code Review src/voku/helper/HtmlDomParser.php The str_replace for the empty tag should be performed before the regex-based stripping of and tags. When serializing a node on PHP < 8, libxml often appends an empty at the end of the document. If preg_replace('/</body>\s*$/i', '', $content) runs first, it will strip the closing tag, leaving a dangling tag that str_replace('', ...) will then fail to match and remove. Moving the literal replacement to the top of this block ensures it correctly catches the empty body fragment before the individual tags are stripped. @voku Reply... |
I checked it, and that feedback is not valid. The current order in |
There was a problem hiding this comment.
Pull request overview
This PR strengthens CI quality gates by adding mutation testing (Infection) + PHPStan checks on PHP 8.3, and adjusts PHP < 8 HTML serialization to avoid libxml-injected formatting/newline regressions (including template round-trip behavior).
Changes:
- Add a new serialization regression test covering trailing-newline behavior in PHP < 8 element serialization.
- Refactor PHP < 8 element node serialization to use whole-document
saveHTML()+ wrapper stripping, plus trailing newline removal. - Add Infection configuration and wire Infection + PHPStan into CI on PHP 8.3 (with an increased workflow timeout).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tests/HtmlSerializationRegressionTest.php |
Adds a regression test for PHP < 8 element serialization output stability (no trailing newline artifacts). |
src/voku/helper/HtmlDomParser.php |
Updates node serialization flow for PHP < 8 by introducing serializeElementNodeForPhpLt8() and routing element serialization through it. |
infection.json5 |
Introduces Infection configuration (phpstan integration, logging). |
.github/workflows/ci.yml |
Adds PHPStan + Infection runs on PHP 8.3 and increases job timeout to accommodate the additional workload. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * On PHP < 8.0, saveHTML($node) injects formatting newlines for detached | ||
| * block-level elements, so serialize a temporary whole document instead. | ||
| * | ||
| * @param \DOMElement $node | ||
| * | ||
| * @return string | ||
| */ | ||
| private function serializeElementNodeForPhpLt8(\DOMElement $node): string | ||
| { | ||
| $document = new \DOMDocument('1.0', $this->getEncoding()); | ||
| $document->preserveWhiteSpace = true; | ||
| $document->formatOutput = false; | ||
|
|
||
| $importedNode = $document->importNode($node, true); | ||
| // @phpstan-ignore instanceof.alwaysTrue (importNode() returns DOMNode here) | ||
| if (!$importedNode instanceof \DOMElement) { | ||
| return ''; | ||
| } | ||
|
|
||
| $document->appendChild($importedNode); | ||
|
|
||
| $content = $document->saveHTML(); | ||
| if ($content === false) { | ||
| return ''; | ||
| } | ||
|
|
||
| $content = (string) \preg_replace('/^<!DOCTYPE[^>]+>\s*/i', '', $content); | ||
|
|
||
| $tagName = \strtolower($importedNode->tagName); | ||
| if ($tagName !== 'html') { | ||
| $content = (string) \preg_replace('/^<html[^>]*>/i', '', $content); | ||
| $content = (string) \preg_replace('/<\/html>\s*$/i', '', $content); | ||
|
|
||
| if ($tagName !== 'body') { | ||
| $content = (string) \preg_replace('/^<body[^>]*>/i', '', $content); | ||
| $content = (string) \preg_replace('/<\/body>\s*$/i', '', $content); | ||
| $content = \str_replace('<body></body>', '', $content); | ||
| } | ||
| } | ||
|
|
||
| if (\substr($content, -1) === "\n") { | ||
| $content = \substr($content, 0, -1); | ||
| } | ||
|
|
||
| return $content; | ||
| } |
| public function testSerializeElementNodeDoesNotAppendTrailingNewline() | ||
| { | ||
| $document = HtmlDomParser::str_get_html( | ||
| '<div><span>one</span><br><p>two</p><template id="card"><section><h2>Title</h2><p>Body</p></section></template></div>' | ||
| ); | ||
|
|
||
| $serializeElementNodeForPhpLt8 = new \ReflectionMethod(HtmlDomParser::class, 'serializeElementNodeForPhpLt8'); | ||
| if (\PHP_VERSION_ID < 80100) { | ||
| // This version check is only for Reflection behavior: private method | ||
| // access still needs setAccessible() when PHP_VERSION_ID < 80100 | ||
| // (PHP 8.0 and earlier). | ||
| $serializeElementNodeForPhpLt8->setAccessible(true); | ||
| } | ||
|
|
||
| $spanHtml = $serializeElementNodeForPhpLt8->invoke($document, $document->getElementByTagName('span')->getNode()); | ||
| $brHtml = $serializeElementNodeForPhpLt8->invoke($document, $document->getElementByTagName('br')->getNode()); | ||
| $pHtml = $serializeElementNodeForPhpLt8->invoke($document, $document->getElementByTagName('p')->getNode()); | ||
|
|
||
| static::assertSame( | ||
| '<span>one</span><br><p>two</p>', | ||
| $spanHtml . $brHtml . $pHtml | ||
| ); | ||
| static::assertSame( | ||
| '<template id="card"><section><h2>Title</h2><p>Body</p></section></template>', | ||
| $serializeElementNodeForPhpLt8->invoke($document, $document->findOne('template')->getNode()) | ||
| ); | ||
| } |
| private function serializeNode(\DOMNode $node): string | ||
| { | ||
| // For script/style on PHP < 8.0 use ownerDocument to avoid fresh-doc | ||
| // libxml injecting "\n" inside raw-text content. | ||
| $useOwnerDoc = \PHP_VERSION_ID < 80000 | ||
| && $node instanceof \DOMElement | ||
| && \in_array(\strtolower($node->tagName), ['script', 'style'], true); | ||
|
|
||
| if (!$useOwnerDoc) { | ||
| $document = new \DOMDocument('1.0', $this->getEncoding()); | ||
| $document->preserveWhiteSpace = true; | ||
| $document->formatOutput = false; | ||
|
|
||
| $importedNode = $document->importNode($node, true); | ||
| // @phpstan-ignore instanceof.alwaysTrue (importNode() returns DOMNode here) | ||
| if (!$importedNode instanceof \DOMNode) { | ||
| return ''; | ||
| } | ||
| if (\PHP_VERSION_ID < 80000 && $node instanceof \DOMElement) { | ||
| return $this->serializeElementNodeForPhpLt8($node); | ||
| } |
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/a9ccc569-86f4-480e-8a03-6b167d81437a Co-authored-by: voku <264695+voku@users.noreply.github.com>
Applied in 2acb658. I extracted the shared PHP < 8 wrapper-stripping logic, updated the |
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/3e747992-7bae-42bb-a7c5-fe304664b5fb Co-authored-by: voku <264695+voku@users.noreply.github.com>
|



This change is