DST timezone identifiers are not currently being handled properly in RFC 822 formatted dates.
Take the CBC RSS Feeds as an example. Mon, 21 Apr 2025 06:00:00 EDT is wrongly parsed as being UTC, which leads to items being marked as 4 hours later they actually are.
RFC 822 does allow the use of defined offsets as timezone identifiers:
zone = "UT" / "GMT" ; Universal Time
; North American : UT
/ "EST" / "EDT" ; Eastern: - 5/ - 4
/ "CST" / "CDT" ; Central: - 6/ - 5
/ "MST" / "MDT" ; Mountain: - 7/ - 6
/ "PST" / "PDT" ; Pacific: - 8/ - 7
/ 1ALPHA ; Military: Z = UT;
; A:-1; (J not used)
; M:-12; N:+1; Y:+12
/ ( ("+" / "-") 4DIGIT ) ; Local differential
; hours+min. (HHMM)
Now, one could argue that using RFC 822 in 2025 is a bit stupid, and I'd agree, but I don't work at CBC unfortunately. The issue has been reported to them and their reply was that their RSS feeds are not a priority.
Would you be open to adding a fix to dateparser.go to handle this edge-case? I can contribute a PR.
DST timezone identifiers are not currently being handled properly in RFC 822 formatted dates.
Take the CBC RSS Feeds as an example.
Mon, 21 Apr 2025 06:00:00 EDTis wrongly parsed as being UTC, which leads to items being marked as 4 hours later they actually are.RFC 822 does allow the use of defined offsets as timezone identifiers:
Now, one could argue that using RFC 822 in 2025 is a bit stupid, and I'd agree, but I don't work at CBC unfortunately. The issue has been reported to them and their reply was that their RSS feeds are not a priority.
Would you be open to adding a fix to
dateparser.goto handle this edge-case? I can contribute a PR.