{"id":2129,"date":"2026-03-20T18:52:46","date_gmt":"2026-03-21T01:52:46","guid":{"rendered":"https:\/\/wp.c9h.org\/cj\/?p=2129"},"modified":"2026-03-20T18:52:47","modified_gmt":"2026-03-21T01:52:47","slug":"the-wwwmechanizechrome-saga-a-comprehensive-narrative-of-pr-104","status":"publish","type":"post","link":"https:\/\/wp.c9h.org\/cj\/?p=2129","title":{"rendered":"The WWW::Mechanize::Chrome Saga: A Comprehensive Narrative of PR #104"},"content":{"rendered":"<h1\nid=\"the-wwwmechanizechrome-saga-a-comprehensive-narrative-of-pr-104\">The<br \/>\nWWW::Mechanize::Chrome Saga: A Comprehensive Narrative of PR #104<\/h1>\n<p>This document synthesizes the extensive work performed from March<br \/>\n13th to March 20th, 2026, to harden, stabilize, and refactor the<br \/>\n<code>WWW::Mechanize::Chrome<\/code> library and its test suite. This<br \/>\neffort involved deep dives into asynchronous programming,<br \/>\nplatform-specific bug hunting, and strategic architectural<br \/>\ndecisions.<\/p>\n<hr \/>\n<h2\nid=\"part-i-the-quest-for-cross-platform-stability-march-13---16\">Part I:<br \/>\nThe Quest for Cross-Platform Stability (March 13 &#8211; 16)<\/h2>\n<p>The initial phase of work focused on achieving a \u201cgreen\u201d test suite<br \/>\nacross a variety of Linux distributions and preparing for a new release.<br \/>\nThis involved significant hardening of the library to account for<br \/>\ndifferent browser versions, OS-level security restrictions, and<br \/>\nfilesystem differences.<\/p>\n<h3 id=\"key-milestones-engineering-decisions\">Key Milestones &amp;<br \/>\nEngineering Decisions:<\/h3>\n<ul>\n<li><strong>Fedora &amp; RHEL-family Success:<\/strong> A major effort<br \/>\nwas undertaken to achieve a 100% pass rate on modern Fedora 43 and<br \/>\nCentOS Stream 10. This required several key engineering decisions to<br \/>\nhandle modern browser behavior:<\/p>\n<ul>\n<li><strong>Decision: Implement Asynchronous DOM Serialization<br \/>\nFallback.<\/strong> Synchronous fallbacks in an async context are<br \/>\ndangerous. To prevent <code>Resource was not cached<\/code> errors during<br \/>\n<code>saveResources<\/code>, we implemented a fully asynchronous fallback<br \/>\nin <code>_saveResourceTree<\/code>. By chaining<br \/>\n<code>_cached_document<\/code> with <code>DOM.getOuterHTML<\/code><br \/>\nmessages, we can reconstruct document content without blocking the event<br \/>\nloop, even if Chromium has evicted the resource from its cache. This<br \/>\nalso proved resilient against Fedora\u2019s security policies, which often<br \/>\nblock <code>file:\/\/<\/code> access.<\/li>\n<li><strong>Decision: Truncate Filenames for Cross-Platform<br \/>\nSafety.<\/strong> To avoid <code>File name too long<\/code> errors,<br \/>\nespecially on Windows where the <code>MAX_PATH<\/code> limit is 260<br \/>\ncharacters, <code>filenameFromUrl<\/code> was hardened. The filename<br \/>\ntruncation was reduced to a more conservative <strong>150<br \/>\ncharacters<\/strong>, leaving ample headroom for deeply nested CI<br \/>\ntemporary directories. Logic was also added to preserve file extensions<br \/>\nduring truncation and to sanitize backslashes from URI paths.<\/li>\n<li><strong>Decision: Expand Browser Discovery Paths.<\/strong> To<br \/>\nsupport RHEL-based systems out-of-the-box, the<br \/>\n<code>default_executable_names<\/code> was expanded to include<br \/>\n<code>headless_shell<\/code> and search paths were updated to include<br \/>\n<code>\/usr\/lib64\/chromium-browser\/<\/code>.<\/li>\n<li><strong>Decision: Mitigate Race Conditions with Stabilization Waits<br \/>\nand Resilient Fetching.<\/strong> On fast systems,<br \/>\n<code>DOM.documentUpdated<\/code> events could invalidate<br \/>\n<code>nodeId<\/code>s immediately after navigation, causing XPath queries<br \/>\nto fail with \u201cCould not find node with given id\u201d. A small stabilization<br \/>\n<code>sleep(0.25s)<\/code> was added after page loads to ensure the DOM<br \/>\nis settled. Furthermore, the asynchronous DOM fetching loop was hardened<br \/>\nto gracefully handle these errors by catching protocol errors and<br \/>\nreturning an empty string for any node that was invalidated during<br \/>\nserialization, ensuring the overall process could complete.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Windows Hardening:<\/strong>\n<ul>\n<li><strong>Decision: Adopt Platform-Aware Watchdogs.<\/strong> The test<br \/>\nsuite\u2019s reliance on <code>ualarm<\/code> was a blocker for Windows, where<br \/>\nit is not implemented. The <code>t::helper::set_watchdog<\/code> function<br \/>\nwas refactored to use standard <code>alarm()<\/code> (seconds) on Windows<br \/>\nand <code>ualarm<\/code> (microseconds) on Unix-like systems, enabling<br \/>\nconsistent test-level timeout enforcement.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Version 0.77 Release:<\/strong>\n<ul>\n<li><strong>Decision: Adopt SOP for Version Synchronization.<\/strong><br \/>\nThe project maintains duplicate version strings across 24+ files. A<br \/>\nStandard Operating Procedure was adopted to use a batch-replacement tool<br \/>\nto update all sub-modules in <code>lib\/<\/code> and to always run<br \/>\n<code>make clean<\/code> and <code>perl Makefile.PL<\/code> to ensure<br \/>\n<code>META.json<\/code> and <code>META.yml<\/code> reflect the new<br \/>\nversion. After achieving stability on Linux, the project version was<br \/>\nbumped to 0.77.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Infrastructure &amp; Strategic Work:<\/strong>\n<ul>\n<li>The <code>ad2<\/code> Windows Server 2025 instance was restored and<br \/>\noptimized, with Active Directory demoted and disk I\/O performance<br \/>\nimproved.<\/li>\n<li>A strategic proposal for the <strong>Heterogeneous Directory<br \/>\nReplication Protocol (HDRP)<\/strong> was drafted and published.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<hr \/>\n<h2 id=\"part-ii-the-great-async-refactor-march-17---18\">Part II: The<br \/>\nGreat Async Refactor (March 17 &#8211; 18)<\/h2>\n<p>Despite success on Linux, tests on the slow <code>ad2<\/code> Windows<br \/>\nhost were still plagued by intermittent, indefinite hangs. This<br \/>\ntriggered a fundamental architectural shift to move the library\u2019s core<br \/>\nfrom a mix of synchronous and asynchronous code to a fully non-blocking<br \/>\ninternal API.<\/p>\n<h3 id=\"key-milestones-engineering-decisions-1\">Key Milestones &amp;<br \/>\nEngineering Decisions:<\/h3>\n<ul>\n<li>\n<p><strong>Decision: Expose a <code>_future<\/code> API.<\/strong><br \/>\nInstead of hardcoding timeouts in the library, the core strategy was to<br \/>\nrefactor all blocking methods (<code>xpath<\/code>, <code>field<\/code>,<br \/>\n<code>get<\/code>, etc.) into thin wrappers around new non-blocking<br \/>\n<code>..._future<\/code> counterparts. This moved timeout management to<br \/>\nthe test harness, allowing for flexible and explicit handling of<br \/>\nstalls.<\/p>\n<div class=\"sourceCode\" id=\"cb1\">\n<pre\nclass=\"sourceCode perl\"><code class=\"sourceCode perl\"><span id=\"cb1-1\"><a href=\"#cb1-1\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"co\"># Example library implementation<\/span><\/span>\n<span id=\"cb1-2\"><a href=\"#cb1-2\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"kw\">sub <\/span><span class=\"fu\">xpath<\/span>(<span class=\"dt\">$<\/span>self, <span class=\"dt\">$query<\/span>, <span class=\"dt\">%options<\/span>) {<\/span>\n<span id=\"cb1-3\"><a href=\"#cb1-3\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"kw\">return<\/span> <span class=\"dt\">$self<\/span>-&gt;<span class=\"dt\">xpath_future<\/span>(<span class=\"dt\">$query<\/span>, <span class=\"dt\">%options<\/span>)-&gt;get;<\/span>\n<span id=\"cb1-4\"><a href=\"#cb1-4\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>}<\/span>\n<span id=\"cb1-5\"><a href=\"#cb1-5\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><\/span>\n<span id=\"cb1-6\"><a href=\"#cb1-6\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"kw\">sub <\/span><span class=\"fu\">xpath_future<\/span>(<span class=\"dt\">$<\/span>self, <span class=\"dt\">$query<\/span>, <span class=\"dt\">%options<\/span>) {<\/span>\n<span id=\"cb1-7\"><a href=\"#cb1-7\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"co\"># Async implementation using $self-&gt;target-&gt;send_message(...)<\/span><\/span>\n<span id=\"cb1-8\"><a href=\"#cb1-8\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>}<\/span><\/code><\/pre>\n<\/div>\n<\/li>\n<li>\n<p><strong>Decision: Centralize Test Hardening in a Helper.<\/strong><br \/>\nA dedicated test library, <code>t\/lib\/t\/helper.pm<\/code>, was created to<br \/>\ncontain all stabilization logic. \u201cSafe\u201d wrappers (<code>safe_get<\/code>,<br \/>\n<code>safe_xpath<\/code>) were implemented there, using<br \/>\n<code>Future-&gt;wait_any<\/code> to race asynchronous operations against<br \/>\na timeout, preventing tests from hanging.<\/p>\n<div class=\"sourceCode\" id=\"cb2\">\n<pre\nclass=\"sourceCode perl\"><code class=\"sourceCode perl\"><span id=\"cb2-1\"><a href=\"#cb2-1\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"co\"># Example test helper implementation<\/span><\/span>\n<span id=\"cb2-2\"><a href=\"#cb2-2\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"kw\">sub <\/span><span class=\"fu\">safe_xpath<\/span> {<\/span>\n<span id=\"cb2-3\"><a href=\"#cb2-3\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"kw\">my<\/span> (<span class=\"dt\">$mech<\/span>, <span class=\"dt\">$query<\/span>, <span class=\"dt\">%options<\/span>) = <span class=\"dt\">@_<\/span>;<\/span>\n<span id=\"cb2-4\"><a href=\"#cb2-4\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"kw\">my<\/span> <span class=\"dt\">$timeout<\/span> = <span class=\"fu\">delete<\/span> <span class=\"dt\">$options<\/span>{timeout} || <span class=\"dv\">5<\/span>;<\/span>\n<span id=\"cb2-5\"><a href=\"#cb2-5\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"kw\">my<\/span> <span class=\"dt\">$call_f<\/span> = <span class=\"dt\">$mech<\/span>-&gt;<span class=\"dt\">xpath_future<\/span>(<span class=\"dt\">$query<\/span>, <span class=\"dt\">%options<\/span>);<\/span>\n<span id=\"cb2-6\"><a href=\"#cb2-6\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"kw\">my<\/span> <span class=\"dt\">$timeout_f<\/span> = <span class=\"dt\">$mech<\/span>-&gt;<span class=\"dt\">sleep_future<\/span>(<span class=\"dt\">$timeout<\/span>)-&gt;then(<span class=\"kw\">sub <\/span>{ Future-&gt;fail(<span class=\"ot\">&quot;<\/span><span class=\"st\">Timeout<\/span><span class=\"ot\">&quot;<\/span>) });<\/span>\n<span id=\"cb2-7\"><a href=\"#cb2-7\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"kw\">return<\/span> Future-&gt;wait_any(<span class=\"dt\">$call_f<\/span>, <span class=\"dt\">$timeout_f<\/span>)-&gt;get;<\/span>\n<span id=\"cb2-8\"><a href=\"#cb2-8\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>}<\/span><\/code><\/pre>\n<\/div>\n<\/li>\n<li>\n<p><strong>Decision: Refactor Node Attribute Cache.<\/strong><br \/>\nInvestigations into flaky checkbox tests (<code>t\/50-tick.t<\/code>)<br \/>\nrevealed that <code>WWW::Mechanize::Chrome::Node<\/code> was storing<br \/>\nattributes as a flat list (<code>[key, val, key, val]<\/code>), which was<br \/>\ninefficient for lookups and individual updates. The cache was refactored<br \/>\nto definitively use a <strong>HashRef<\/strong>, providing O(1) lookups<br \/>\nand enabling atomic dual-updates where both the browser property (via<br \/>\nJS) and the internal library attribute are synchronized<br \/>\nsimultaneously.<\/p>\n<\/li>\n<li>\n<p><strong>Decision: Implement Self-Cancelling Socket<br \/>\nWatchdog.<\/strong> On Windows, traditional watchdog processes often<br \/>\nfailed to detect parent termination, leading to 60-second hangs after<br \/>\nsuccessful tests. We implemented a new socket-based watchdog in<br \/>\n<code>t::helper<\/code> that listens on an ephemeral port; the background<br \/>\nprocess terminates immediately when the parent socket closes,<br \/>\neliminating these cumulative delays.<\/p>\n<\/li>\n<li>\n<p><strong>Decision: Deep Recursive Refactoring &amp; Form<br \/>\nSelection.<\/strong> To make the API truly non-blocking, the entire<br \/>\ninternal call stack had to be refactored. For example, making<br \/>\n<code>get_set_value_future<\/code> non-blocking required first making its<br \/>\ndependency, <code>_field_by_name<\/code>, asynchronous. This culminated<br \/>\nin refactoring the entire form selection API (<code>form_name<\/code>,<br \/>\n<code>form_id<\/code>, etc.) to use the new asynchronous<br \/>\n<code>_future<\/code> lookups, which was a key step in mitigating the<br \/>\nWindows deadlocks.<\/p>\n<\/li>\n<li>\n<p><strong>Decision: Fix Critical Regressions &amp; Memory<br \/>\nCycles.<\/strong><\/p>\n<ul>\n<li>\n<p><strong>Evaluation Normalization:<\/strong> Implemented a<br \/>\n<code>_process_eval_result<\/code> helper to centralize the parsing of<br \/>\nresults from <code>Runtime.evaluate<\/code>. This ensures consistent<br \/>\nhandling of return values and exceptions between synchronous<br \/>\n(<code>eval_in_page<\/code>) and asynchronous (<code>eval_future<\/code>)<br \/>\ncalls.<\/p>\n<\/li>\n<li>\n<p><strong>Memory Cycle Mitigation:<\/strong> A significant memory<br \/>\nleak was discovered where closures attached to CDP event futures (like<br \/>\nfor asynchronous body retrieval) would capture strong references to<br \/>\n<code>$self<\/code> and the <code>$response<\/code> object, creating a<br \/>\ncircular reference. The established rule is to now always use<br \/>\n<code>Scalar::Util::weaken<\/code> on both <code>$self<\/code> and any<br \/>\nother relevant objects before they are used inside a<br \/>\n<code>-&gt;then<\/code> block that is stored on an object.<\/p>\n<\/li>\n<li>\n<p><strong>Context Propagation (<code>wantarray<\/code>):<\/strong> A<br \/>\nmajor regression was discovered where Perl\u2019s <code>wantarray<\/code><br \/>\ncontext, which distinguishes between scalar and list context, was lost<br \/>\ninside asynchronous <code>Future-&gt;then<\/code> blocks. This caused<br \/>\nmethods like <code>xpath<\/code> to return incorrect results (e.g., a<br \/>\ncount instead of a list of nodes). The solution was to adopt the \u201cAsync<br \/>\nContext Pattern\u201d: capture <code>wantarray<\/code> in the synchronous<br \/>\nwrapper, pass it as an option to the <code>_future<\/code> method, and<br \/>\nthen use that captured value inside the future\u2019s final resolution<br \/>\nblock.<\/p>\n<div class=\"sourceCode\" id=\"cb3\">\n<pre\nclass=\"sourceCode perl\"><code class=\"sourceCode perl\"><span id=\"cb3-1\"><a href=\"#cb3-1\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"co\"># Synchronous Wrapper<\/span><\/span>\n<span id=\"cb3-2\"><a href=\"#cb3-2\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"kw\">sub <\/span><span class=\"fu\">xpath<\/span>(<span class=\"dt\">$<\/span>self, <span class=\"dt\">$query<\/span>, <span class=\"dt\">%options<\/span>) {<\/span>\n<span id=\"cb3-3\"><a href=\"#cb3-3\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"dt\">$options<\/span>{ <span class=\"fu\">wantarray<\/span> } = <span class=\"fu\">wantarray<\/span>; <span class=\"co\"># 1. Capture<\/span><\/span>\n<span id=\"cb3-4\"><a href=\"#cb3-4\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"kw\">return<\/span> <span class=\"dt\">$self<\/span>-&gt;<span class=\"dt\">xpath_future<\/span>(<span class=\"dt\">$query<\/span>, <span class=\"dt\">%options<\/span>)-&gt;get; <span class=\"co\"># 2. Pass<\/span><\/span>\n<span id=\"cb3-5\"><a href=\"#cb3-5\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>}<\/span>\n<span id=\"cb3-6\"><a href=\"#cb3-6\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><\/span>\n<span id=\"cb3-7\"><a href=\"#cb3-7\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"co\"># Asynchronous Implementation<\/span><\/span>\n<span id=\"cb3-8\"><a href=\"#cb3-8\" aria-hidden=\"true\" tabindex=\"-1\"><\/a><span class=\"kw\">sub <\/span><span class=\"fu\">xpath_future<\/span>(<span class=\"dt\">$<\/span>self, <span class=\"dt\">$query<\/span>, <span class=\"dt\">%options<\/span>) {<\/span>\n<span id=\"cb3-9\"><a href=\"#cb3-9\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"kw\">my<\/span> <span class=\"dt\">$wantarray<\/span> = <span class=\"fu\">delete<\/span> <span class=\"dt\">$options<\/span>{ <span class=\"fu\">wantarray<\/span> }; <span class=\"co\"># 3. Retrieve<\/span><\/span>\n<span id=\"cb3-10\"><a href=\"#cb3-10\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"co\"># ... async logic ...<\/span><\/span>\n<span id=\"cb3-11\"><a href=\"#cb3-11\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    <span class=\"kw\">return<\/span> <span class=\"dt\">$doc<\/span>-&gt;<span class=\"dt\">then<\/span>(<span class=\"kw\">sub <\/span>{<\/span>\n<span id=\"cb3-12\"><a href=\"#cb3-12\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>        <span class=\"kw\">if<\/span> (<span class=\"dt\">$wantarray<\/span>) { <span class=\"co\"># 4. Respect<\/span><\/span>\n<span id=\"cb3-13\"><a href=\"#cb3-13\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>            <span class=\"kw\">return<\/span> Future-&gt;done(<span class=\"dt\">@results<\/span>);<\/span>\n<span id=\"cb3-14\"><a href=\"#cb3-14\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>        } <span class=\"kw\">else<\/span> {<\/span>\n<span id=\"cb3-15\"><a href=\"#cb3-15\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>            <span class=\"kw\">return<\/span> Future-&gt;done(<span class=\"dt\">$results<\/span>[<span class=\"dv\">0<\/span>]);<\/span>\n<span id=\"cb3-16\"><a href=\"#cb3-16\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>        }<\/span>\n<span id=\"cb3-17\"><a href=\"#cb3-17\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>    });<\/span>\n<span id=\"cb3-18\"><a href=\"#cb3-18\" aria-hidden=\"true\" tabindex=\"-1\"><\/a>}<\/span><\/code><\/pre>\n<\/div>\n<\/li>\n<li>\n<p><strong>Asynchronous Body Retrieval &amp; Robust Content<br \/>\nFallbacks:<\/strong> Fixed a bug where <code>decoded_content()<\/code><br \/>\nwould return empty strings by ensuring it awaited a<br \/>\n<code>__body_future<\/code>. This was implemented by storing the<br \/>\nretrieval future directly on the response object<br \/>\n(<code>$response-&gt;{__body_future}<\/code>). To make this more robust,<br \/>\na tiered strategy was implemented: first try to get the content from the<br \/>\nnetwork response, but if that fails (e.g., for <code>about:blank<\/code><br \/>\nor due to cache eviction), fall back to a JavaScript<br \/>\n<code>XMLSerializer<\/code> to get the live DOM content.<\/p>\n<\/li>\n<li>\n<p><strong>Signature Hardening:<\/strong> Fixed \u201cToo few arguments\u201d<br \/>\nerrors when using modern Perl signatures with<br \/>\n<code>Future-&gt;then<\/code>. Callbacks were updated to use optional<br \/>\nparameters (<code>sub($result = undef) { ... }<\/code>) to gracefully<br \/>\nhandle futures that resolve with no value.<\/p>\n<\/li>\n<li>\n<p><strong>XHTML \u201cSplit-Brain\u201d Bug:<\/strong> Resolved a<br \/>\nlong-standing Chromium bug (40130141) where content provided via<br \/>\n<code>setDocumentContent<\/code> is parsed differently than content<br \/>\nloaded from a URL. A workaround was implemented: for XHTML documents,<br \/>\nWMC now uses a JavaScript-based XPath evaluation<br \/>\n(<code>document.evaluate<\/code>) against the live DOM, bypassing the<br \/>\nbroken CDP search mechanism.<\/p>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3 id=\"derived-architectural-rules-sops\">Derived Architectural Rules<br \/>\n&amp; SOPs:<\/h3>\n<ul>\n<li><strong>Rule: Always provide <code>_future<\/code> variants.<\/strong><br \/>\nEvery library method that interacts with the browser via CDP must have a<br \/>\nnon-blocking asynchronous counterpart.<\/li>\n<li><strong>Rule: Centralize stabilization in the test layer.<\/strong><br \/>\nAll timeout and retry logic should reside in the test harness<br \/>\n(<code>t\/lib\/t\/helper.pm<\/code>), not in the core library.<\/li>\n<li><strong>Rule: Explicitly propagate <code>wantarray<\/code><br \/>\ncontext.<\/strong> Synchronous wrappers must capture the caller\u2019s context<br \/>\nand pass it down the <code>Future<\/code> chain to ensure correct<br \/>\nscalar\/list behavior.<\/li>\n<li><strong>Rule: The entire call chain must be asynchronous.<\/strong><br \/>\nTo enable non-blocking timeouts, even a single \u201chidden\u201d blocking call in<br \/>\nan otherwise asynchronous method will cause a stall.<\/li>\n<li><strong>SOP: Reduce Library Noise.<\/strong> Diagnostic messages<br \/>\n(<code>warn<\/code>, <code>note<\/code>, <code>diag<\/code>) should be<br \/>\nremoved from library code before commits. All such messages should be<br \/>\nconverted to use the internal <code>$self-&gt;log('debug', ...)<\/code><br \/>\nmechanism, ensuring a clean TAP output for CI systems.<\/li>\n<\/ul>\n<hr \/>\n<h2 id=\"part-iii-the-mutationobserver-saga-march-19\">Part III: The<br \/>\n<code>MutationObserver<\/code> Saga (March 19)<\/h2>\n<p>With most of the library refactored to be asynchronous, one stubborn<br \/>\ntest, <code>t\/65-is_visible.t<\/code>, continued to fail with timeouts.<br \/>\nThis led to an ambitious, but ultimately unsuccessful, attempt to<br \/>\nreplace the <code>wait_until_visible<\/code> polling logic with a more<br \/>\n\u201cmodern\u201d <code>MutationObserver<\/code>.<\/p>\n<h3 id=\"key-milestones-challenges\">Key Milestones &amp; Challenges:<\/h3>\n<ul>\n<li><strong>The Theory:<\/strong> The goal was to replace an inefficient<br \/>\n<code>repeat { sleep }<\/code> loop with an event-driven<br \/>\n<code>MutationObserver<\/code> in JavaScript that would notify Perl<br \/>\nimmediately when an element\u2019s visibility changed.<\/li>\n<li><strong>Implementation &amp; Cascade Failure:<\/strong> The<br \/>\nimplementation proved incredibly difficult and introduced a series of<br \/>\nnew, hard-to-diagnose bugs:<\/p>\n<ol type=\"1\">\n<li>An incorrect function signature for<br \/>\n<code>callFunctionOn_future<\/code>.<\/li>\n<li>A critical unit mismatch, passing seconds from Perl to JavaScript\u2019s<br \/>\n<code>setTimeout<\/code>, which expected milliseconds.<\/li>\n<li>A fundamental hang where the <code>MutationObserver<\/code>\u2019s<br \/>\nJavaScript <code>Promise<\/code> would never resolve, even after the<br \/>\nunderlying DOM element changed.<\/li>\n<\/ol>\n<\/li>\n<li><strong>Debugging Maze:<\/strong> Multiple attempts to fix the<br \/>\n<code>checkVisibility<\/code> JavaScript logic inside the observer<br \/>\ncallback, including making it more robust by adding DOM tree traversal<br \/>\nand extensive <code>console.log<\/code> tracing, failed to resolve the<br \/>\nhang. This highlighted the opacity and difficulty of debugging complex,<br \/>\ncross-language asynchronous interactions, especially when dealing with<br \/>\nlow-level browser APIs.<\/li>\n<\/ul>\n<h3 id=\"procedural-learning-granular-edits\">Procedural Learning:<br \/>\nGranular Edits<\/h3>\n<p>The effort was plagued by procedural missteps in using automated<br \/>\nfile-editing tools. Initial attempts to replace large code blocks in a<br \/>\nsingle operation led to accidental code loss and match failures.<\/p>\n<ul>\n<li><strong>Decision: Adopt \u201cDelete, then Add\u201d Workflow.<\/strong><br \/>\nFollowing forceful user correction, a new SOP was established for all<br \/>\nfuture modifications:<\/p>\n<ol type=\"1\">\n<li><strong>Isolate:<\/strong> Break the file into small, manageable<br \/>\nchunks (e.g., 250 lines).<\/li>\n<li><strong>Delete:<\/strong> Perform a \u201cdelete\u201d operation by replacing<br \/>\nthe old code block with an empty string.<\/li>\n<li><strong>Add:<\/strong> Perform an \u201cadd\u201d operation by inserting the<br \/>\nnew code into the empty space.<\/li>\n<li><strong>Verify:<\/strong> Verifying each atomic step before<br \/>\nproceeding. This granular process, while slower, ensured surgical<br \/>\nprecision and regained technical control over the large<br \/>\n<code>Chrome.pm<\/code> module.<\/li>\n<\/ol>\n<\/li>\n<\/ul>\n<p>The consistent failure of the <code>MutationObserver<\/code> approach<br \/>\neventually led to the decision to abandon it in favor of stabilizing the<br \/>\noriginal, more transparent implementation.<\/p>\n<hr \/>\n<h2 id=\"part-iv-reversion-and-final-stabilization-march-20\">Part IV:<br \/>\nReversion and Final Stabilization (March 20)<\/h2>\n<p>After exhausting all reasonable attempts to fix the<br \/>\n<code>MutationObserver<\/code>, a strategic decision was made to revert<br \/>\nto the simpler, more transparent polling implementation and fix it<br \/>\ncorrectly. This proved to be the correct path to a stable solution.<\/p>\n<h3 id=\"key-milestones-engineering-decisions-2\">Key Milestones &amp;<br \/>\nEngineering Decisions:<\/h3>\n<ul>\n<li><strong>Decision: Perform Strategic Reversion.<\/strong> The<br \/>\n<code>MutationObserver<\/code> implementation, when integrated via<br \/>\n<code>callFunctionOn_future<\/code> with <code>awaitPromise<\/code>,<br \/>\nproved fundamentally unstable. Its JavaScript promise would consistently<br \/>\nfail to resolve, causing indefinite hangs. A decision was made to<br \/>\n<strong>revert all <code>MutationObserver<\/code> code<\/strong> from<br \/>\n<code>WWW::Mechanize::Chrome.pm<\/code> and restore the original<br \/>\n<code>repeat { sleep }<\/code> polling mechanism. A stable,<br \/>\nunderstandable solution was prioritized over an elegant but broken<br \/>\none.<\/li>\n<li><strong>Decision: Correct Timeout Delegation in the<br \/>\nHarness.<\/strong> The root cause of the original timeout failure was<br \/>\nidentified as a race condition in the <code>t\/lib\/t\/helper.pm<\/code><br \/>\ntest harness. The <code>safe_wait_until_*<\/code> wrappers were<br \/>\nimplementing their own timeout (via <code>wait_any<\/code> and<br \/>\n<code>sleep_future<\/code>) that raced against the underlying polling<br \/>\nfunction\u2019s internal timeout. This led to intermittent failures on slow<br \/>\nmachines. The helpers were refactored to <strong>delegate all timeout<br \/>\nmanagement to the library\u2019s polling functions<\/strong>, ensuring a<br \/>\nsingle, authoritative timer controlled the operation.<\/li>\n<li><strong>Decision: Optimize Polling Performance.<\/strong> At the<br \/>\nuser\u2019s request, the polling interval was reduced from 300ms to<br \/>\n<strong>150ms<\/strong>. This modest performance improvement reduced the<br \/>\ntest suite\u2019s wallclock execution time by over a second while maintaining<br \/>\nstability.<\/li>\n<li><strong>Decision: Tune Test Watchdogs.<\/strong> The global watchdog<br \/>\ntimeout was adjusted to 12 seconds, specifically calculated as 1.5x the<br \/>\nobserved real execution time of the optimized test. This provides a<br \/>\ndata-driven safety margin for CI.<\/li>\n<\/ul>\n<hr \/>\n<h2\nid=\"part-v-the-last-bug---a-platform-specific-memory-leak-march-20\">Part<br \/>\nV: The Last Bug &#8211; A Platform-Specific Memory Leak (March 20)<\/h2>\n<p>With all other tests passing, a single memory leak failure in<br \/>\n<code>t\/78-memleak.t<\/code> persisted, but only on the Windows<br \/>\n<code>ad2<\/code> environment. This required a different approach than<br \/>\nthe timeout fixes.<\/p>\n<h3 id=\"key-milestones\">Key Milestones:<\/h3>\n<ul>\n<li><strong>The Bug:<\/strong> A strong reference cycle involving the<br \/>\n<code>on_dialog<\/code> event listener was not being broken on Windows,<br \/>\ndespite multiple attempts to fix it. Fixes that worked on Linux (such as<br \/>\ncalling <code>on_dialog(undef)<\/code> in <code>DESTROY<\/code>) were not<br \/>\nsufficient on the Windows host.<\/li>\n<li><strong>The Diagnosis:<\/strong> The issue was determined to be a<br \/>\ndeep, platform-specific interaction between Perl\u2019s garbage collector,<br \/>\nthe <code>IO::Async<\/code> event loop implementation on Windows, and the<br \/>\n<code>Test::Memory::Cycle<\/code> module. The cycle report was identical<br \/>\non both platforms, but the cleanup behavior was different.<\/li>\n<li><strong>Failed Attempts:<\/strong> A series of increasingly<br \/>\naggressive fixes were attempted to break the cycle, including:<\/p>\n<ol type=\"1\">\n<li>Moving the <code>on_dialog(undef)<\/code> call from<br \/>\n<code>close()<\/code> to <code>DESTROY()<\/code>.<\/li>\n<li>Explicitly <code>delete<\/code>ing the listener and callback<br \/>\nproperties from the object hash in <code>DESTROY<\/code>.<\/li>\n<li>Swapping between <code>$self-&gt;remove_listener<\/code> and<br \/>\n<code>$self-&gt;target-&gt;unlisten<\/code> in a mistaken attempt to find<br \/>\nthe correct un-registration method.<\/li>\n<\/ol>\n<\/li>\n<li><strong>Pragmatic Solution:<\/strong> After exhausting all reasonable<br \/>\ncode-level fixes without a resolution on Windows, the user opted to mark<br \/>\nthe failing test as a known issue for that specific platform.<\/li>\n<li><strong>Final Fix:<\/strong> The single failing test in<br \/>\n<code>t\/78-memleak.t<\/code> was wrapped in a conditional<br \/>\n<code>TODO<\/code> block that only executes on Windows<br \/>\n(<code>if ($^O =~ \/MSWin32\/i)<\/code>), formally acknowledging the bug<br \/>\nwithout blocking the build. This allows the test suite to pass in CI<br \/>\nenvironments while flagging the issue for future, deeper<br \/>\ninvestigation.<\/li>\n<\/ul>\n<hr \/>\n<h2 id=\"part-vi-ci-hardening-march-20\">Part VI: CI Hardening (March<br \/>\n20)<\/h2>\n<p>A final failure in the GitHub Actions CI environment revealed one<br \/>\nlast configuration flaw.<\/p>\n<h3 id=\"key-milestones-1\">Key Milestones:<\/h3>\n<ul>\n<li><strong>The Bug:<\/strong> The CI was running<br \/>\n<code>prove --nocount --jobs 3 -I local\/ -bl xt t<\/code> directly. This<br \/>\ncommand was missing the crucial <code>-It\/lib<\/code> include path, which<br \/>\nis necessary for test files to locate the <code>t::helper<\/code> module.<br \/>\nThis resulted in nearly all tests failing with<br \/>\n<code>Can't locate t\/helper.pm in @INC<\/code>.<\/li>\n<li><strong>The Investigation:<\/strong> An analysis of<br \/>\n<code>Makefile.PL<\/code> revealed a custom <code>MY::test<\/code> block<br \/>\nspecifically designed to inject the <code>-It\/lib<\/code> flag into the<br \/>\n<code>make test<\/code> command. This confirmed that<br \/>\n<code>make test<\/code> is the correct, canonical way to run the test<br \/>\nsuite for this project.<\/li>\n<li><strong>The Fix:<\/strong> The<br \/>\n<code>.github\/workflows\/linux.yml<\/code> file was modified to replace<br \/>\nthe direct <code>prove<\/code> call with <code>make test<\/code> in the<br \/>\n<code>Run Tests<\/code> step. This ensures the CI environment runs the<br \/>\ntests in the exact same way as a local developer, with all necessary<br \/>\ninclude paths correctly configured by the project\u2019s build system.<\/li>\n<\/ul>\n<h2 id=\"final-outcome\">Final Outcome<\/h2>\n<p>After this long and arduous journey, the<br \/>\n<code>WWW::Mechanize::Chrome<\/code> test suite is now stable and<br \/>\n<strong>passing on all targeted platforms<\/strong>, with known<br \/>\nplatform-specific issues clearly documented in the code. The project is<br \/>\nin a vastly more robust and reliable state.<\/p>\n\n<div class=\"twitter-share\"><a href=\"https:\/\/twitter.com\/intent\/tweet?via=cjamescollier\" class=\"twitter-share-button\">Tweet<\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>The WWW::Mechanize::Chrome Saga: A Comprehensive Narrative of PR #104 This document synthesizes the extensive work performed from March 13th to March 20th, 2026, to harden, stabilize, and refactor the WWW::Mechanize::Chrome library and its test suite. This effort involved deep dives into asynchronous programming, platform-specific bug hunting, and strategic architectural decisions. Part I: The Quest for [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[337,218,17,339,79,338,47,163,102,18,86,335,251],"tags":[],"class_list":["post-2129","post","type-post","status-publish","format-standard","hentry","category-337","category-centos","category-debian","category-fedora","category-free-software","category-http","category-linux","category-networking","category-open-source","category-perl","category-tls","category-trixie","category-windows"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p1YDIB-yl","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts\/2129","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2129"}],"version-history":[{"count":2,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts\/2129\/revisions"}],"predecessor-version":[{"id":2131,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=\/wp\/v2\/posts\/2129\/revisions\/2131"}],"wp:attachment":[{"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2129"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2129"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wp.c9h.org\/cj\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2129"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}