Skip to content

Add comprehensive edge case tests for external resources and mass publish pipeline#2800

Draft
Copilot wants to merge 4 commits intomasterfrom
copilot/improve-tests-in-ocw-studio
Draft

Add comprehensive edge case tests for external resources and mass publish pipeline#2800
Copilot wants to merge 4 commits intomasterfrom
copilot/improve-tests-in-ocw-studio

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 19, 2025

Recent production issues stemmed from data migration converting Hugo shortcode subscripts/superscripts in external resource links (e.g., H{{< sub "2" >}}O) to nested markdown, breaking the mass publish pipeline. Test coverage was insufficient to catch these edge cases.

Changes

External Resource API Edge Cases (external_resources/api_edge_cases_test.py, 174 lines)

  • Network errors: timeouts, SSL failures, connection errors, redirect chains
  • URL edge cases: empty, malformed, internationalized domains, very long paths
  • Server errors: 500/502/503/504 status codes

Mass Build/Publish Edge Cases (content_sync/mass_build_edge_cases_test.py, 276 lines)

  • Empty websites, mixed publish statuses, large-scale scenarios (50+ sites)
  • Content with null metadata, special characters, nested markdown structures
  • All version/offline parameter combinations

External Resource Tasks Edge Cases (external_resources/tasks_edge_cases_test.py, 450 lines)

  • Concurrent updates and race conditions
  • Wayback Machine API failures, missing jobs, partial responses
  • Submission interval boundary conditions
  • Deleted content handling

Subscript/Superscript Integration Tests (websites/management/commands/markdown_cleaning/external_resource_subsup_integration_test.py, 484 lines)

  • Regression tests for Hugo shortcode conversion issue
  • Chemical formulas: H{{< sub "2" >}}O, CO{{< sub "2" >}}, Fe{{< sub "2" >}}O{{< sub "3" >}}
  • Trademark symbols: Product{{< sup "®" >}}
  • Validates converted markdown doesn't create nested structures
  • Mass publish compatibility, idempotency verification

Example

def test_external_resource_conversion_with_subscript_in_link_text(settings):
    """Validates H{{< sub "2" >}}O converts without breaking publish pipeline"""
    markdown = '[H{{< sub "2" >}}O](http://example.com/water)'
    content = WebsiteContentFactory.create(markdown=markdown, website=website)
    
    cleaner.update_website_content(content)
    
    # Verifies: no nested brackets, Hugo shortcodes converted to HTML
    assert "resource_link" in content.markdown
    assert "<sub>2</sub>" in content.markdown
    assert "[[" not in content.markdown  # No invalid nesting

Impact: 1,384 lines of test code, 50+ test functions preventing regressions in external resource handling and mass publish operations.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Improve Tests in OCW Studio</issue_title>
<issue_description>### Description/Context

Recently, we have been experiencing issues on QA and Production alike related to mass publish and mass build. The issue stemmed from the data migration of legacy external links to External Resources. This converted the subscripts and superscripts in markdown (in Hugo partials syntax) to a nested markdown syntax for an external resource, which broke the mass publish pipeline with invalid data errors.

Apart from this, it was observed that a few tests became flaky after a Posthog version upgrade. The tests, however, were failing locally only.

We need to come up with a plan to improve testing in OCW locally or in QA.

Plan/Design

TBA</issue_description>

<agent_instructions>work on improving tests in general and increasing coverage overall. Review our test suite to identify any improper tests that need to be fixed.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@pdpinch @umar8hassan please share a plan for this before you start working on it. @umar8hassan > [@umar8hassan](https://github.com/umar8hassan) please share a plan for this before you start working on it.

Since both the mentioned issues have been resolved in separate pull requests, we can work on improving tests in general and increasing coverage overall.

Over the top of my head, I can work on this in parallel and ask Copilot to review our test suite and increase coverage. I'd then review/update the generated tests.</comment_new>
<comment_new>@zawan-ila
@pdpinch @umar8hassan One important improvement we can make here is to add e2e tests for the offline theme. Over time, I have fixed a number of issues with the offline theme and feel that many of the problems that arose were regressions that would have been caught earlier if we had tests for the offline theme (similar to the ones that we have for the online theme)

While working on these issues, I have wanted to add tests but have been held back by the absence of a test infrastructure for offline theme. If we do set up the test infra for the offline theme, it will be easier to test any offline theme changes going forward and give us more confidence in the correctness of the modifications that we make. </comment_new>
<comment_new>@pdpinch
Ok. Please open an issue for e2e tests for the offline theme and make a plan for how you would approach it. I presume it would be similar to how we handle e2e tests for the courses-v2 theme. </comment_new>
<comment_new>@zawan-ila
@pdpinch I have created https://github.com/mitodl/hq/issues/9186 for e2e tests for the offline theme.</comment_new>

  • Fixes mitodl/hq#8286

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits November 19, 2025 12:04
…ld functionality

Co-authored-by: umar8hassan <71461724+umar8hassan@users.noreply.github.com>
…subscripts/superscripts

These tests specifically address the regression mentioned in the issue where
data migration converted Hugo shortcode subscripts/superscripts in external
resource links to nested markdown, breaking the mass publish pipeline.

Co-authored-by: umar8hassan <71461724+umar8hassan@users.noreply.github.com>
Co-authored-by: umar8hassan <71461724+umar8hassan@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve tests in OCW Studio for better coverage Add comprehensive edge case tests for external resources and mass publish pipeline Nov 19, 2025
Copilot AI requested a review from umar8hassan November 19, 2025 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants