Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions .github/workflows/report_404_packages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
name: Report unreachable packages

on:
workflow_dispatch:
schedule:
- cron: "31 7 * * *"

permissions:
contents: write
pull-requests: write

concurrency:
group: report-404-packages
cancel-in-progress: false

jobs:
report_404_packages:
runs-on: ubuntu-latest
env:
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0

# Always roll the cache, GitHub will evict it after 7 days of inactivity.
- name: Restore reported URLs cache
id: reported_urls_cache
uses: actions/cache@v5
with:
path: ./reported_urls.txt
key: reported-urls-cache-${{ github.run_id }}
restore-keys: |
reported-urls-cache-

- name: Require cache for scheduled runs
run: |
# cache-hit semantics:
# true => exact key match
# false => restore-key match
# "" => true miss (nothing restored)
if [ "${{ github.event_name }}" != "workflow_dispatch" ] && [ "${{ steps.reported_urls_cache.outputs.cache-hit }}" = "" ]; then
echo "::error::No reported_urls cache found. Run workflow_dispatch once to bootstrap."
exit 1
fi

- name: Ensure reported_urls.txt exists
run: touch ./reported_urls.txt

- name: Decide run cadence
id: cadence
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "run_report=true" >> "$GITHUB_OUTPUT"
exit 0
fi

# Daily schedule, but only report on first Saturday of the month.
if [ "$(date -u +%u)" -eq 6 ] && [ "$(date -u +%d)" -le 7 ]; then
echo "run_report=true" >> "$GITHUB_OUTPUT"
else
echo "run_report=false" >> "$GITHUB_OUTPUT"
echo "::notice::Skipping report run: not the first Saturday of the month."
fi

- name: Set up Python
if: steps.cadence.outputs.run_report == 'true'
uses: actions/setup-python@v5
with:
python-version: "3.13"

- name: Set up uv
if: steps.cadence.outputs.run_report == 'true'
uses: astral-sh/setup-uv@v5

- name: Configure git
if: steps.cadence.outputs.run_report == 'true'
run: |
git config user.name "thecrawl bot"
git config user.email "noreply@packagecontrol.io"

- name: Run 404 package report
id: report
if: steps.cadence.outputs.run_report == 'true'
run: |
uv run -m tools.report_404_packages \
--commit \
--build-pr-message \
-z \
--ignore-file ./reported_urls.txt > ./reported_records.txt

if [ -s ./reported_records.txt ]; then
echo "has_results=true" >> "$GITHUB_OUTPUT"
else
echo "has_results=false" >> "$GITHUB_OUTPUT"
fi

- name: No packages to report
if: steps.cadence.outputs.run_report == 'true' && steps.report.outputs.has_results != 'true'
run: echo "No unreachable packages to report."

- name: Prepare branch
id: branch
if: steps.cadence.outputs.run_report == 'true' && steps.report.outputs.has_results == 'true'
run: |
report_hash="$(sha256sum ./reported_records.txt | awk '{print substr($1,1,12)}')"
branch_name="bot/report-404-${report_hash}-${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT:-1}"

git switch -c "$branch_name"
git push --set-upstream origin "$branch_name"
echo "name=$branch_name" >> "$GITHUB_OUTPUT"

- name: Open pull request
if: steps.cadence.outputs.run_report == 'true' && steps.report.outputs.has_results == 'true'
run: |
gh pr create \
--base "${{ github.ref_name }}" \
--head "${{ steps.branch.outputs.name }}" \
--title "$(cat ./pr_title.txt)" \
--body-file ./pr_body.md

- name: Update reported URL list for cache
if: steps.cadence.outputs.run_report == 'true'
run: |
# Append URLs from this run (name\0details\0timestamp records).
awk -v RS='\n' -v FS='\0' 'NF >= 2 && $2 != "" { print $2 }' \
./reported_records.txt >> ./reported_urls.txt

# Keep only URLs still present in workspace.json.
if [ ! -f ./workspace.json ]; then
echo "::error::workspace.json missing; cannot prune reported URLs."
exit 1
fi

tmp_file="$(mktemp)"
while IFS= read -r url; do
[ -z "$url" ] && continue
if grep -Fq "\"$url\"" ./workspace.json; then
echo "$url" >> "$tmp_file"
fi
done < ./reported_urls.txt

sort -u "$tmp_file" > ./reported_urls.txt
rm -f "$tmp_file"

echo "Reported URLs:"
cat ./reported_urls.txt
17 changes: 17 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[project]
name = "package-control-channel"
version = "0.1.0"
description = "Utilities for maintaining package_control_channel"
readme = "readme.md"
requires-python = ">=3.13"
dependencies = []

[dependency-groups]
dev = [
"pytest>=9.0.0",
"mockito>=2.0.4",
"pytest-mockito>=0.0.6.post1",
]

[tool.pytest.ini_options]
testpaths = ["tools"]
61 changes: 60 additions & 1 deletion tools/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,67 @@ Check-only mode (non-zero exit code if any file would be changed):
uv run -m tools.format_package_control_channel ./repository --check
```

## `report_404_packages.py`

Finds packages in a crawler `workspace.json` that fail with `fatal: 404`
for at least --min-age days (default: 21) and report them.

If --commit is set, it actually removes the packages from the repository and
creates a commit.

If you don't specify a --workspace, it will download one for you from
`packagecontrol/thecrawl`. (Requires `gh`.)

The default sources are derived from `git origin`; use `--allowed-source`
to override.

Use `--ignore` (name or details URL) and/or `--ignore-file` to skip already
known packages so recurring scheduled runs don't re-report them.

### Usage

Report only (`-z` for machine friendly output):

```bash
uv run -m tools.report_404_packages
uv run -m tools.report_404_packages -z
```

Use a specific workspace file:

```bash
uv run -m tools.report_404_packages --workspace ./workspace.json
```

Ignore specific package details URLs (or names):

```bash
uv run -m tools.report_404_packages \
--ignore "https://github.com/axsuul/sublime-0x0" \
--ignore "SublimeLinter,AnotherPackage"
```

Ignore via file:

```bash
uv run -m tools.report_404_packages --ignore-file ./tools/known-404s.txt
```

Apply removals and commit:

```bash
uv run -m tools.report_404_packages --commit
```

Build PR message files (`pr_title.txt`, `pr_body.md`) from the report. This
is for the CI.

```bash
uv run -m tools.report_404_packages --build-pr-message
```

### Tests

```bash
uvx pytest tools/test_channel_json_format.py
uvx pytest
```
Loading
Loading