Releases: Open-Security-Mapping-Project/ice_detention_scraper
v. 1.1.1 - ICE Detention Facilities Data Scraper and Enricher
We have a new ICE Detention Facilities Data Scraper and Enricher release for you. It's a Python script managed by the Open Security Mapping Project.
In short this will help identify the online profile of each ICE detention facility, and a lot of additional data about each facility. Please see the project home page for more about mapping these facilities and other detailed info sources.
This script scrapes ICE detention facility data from ICE.gov and enriches it with information from Wikipedia, Wikidata, and
OpenStreetMap.
The original purpose is to identify if the detention facilities have data on Wikipedia, Wikidata and OpenStreetMap, which will help with documenting the facilities appropriately. As these entries get fixed up, you should be able to see your search results change almost immediately.
This release covers some bugs that turned up - exports are cleaner now!
What's Changed Since 1.1.0
- Fix flatten, dep tracking, and other small bugs by @johnseekins in #103
Details
- Fix existing dictionary flattening function to properly flatten lists
- drop "empty" lines from spreadsheet collection
- group dependabot updates to reduce churn in PRs somewhat
- Move ruff out of standard dependencies (again to hopefully reduce PR churn)
- Re-order inspections so we hopefully see the newest one first
- Default export to csv (like it used to be)
- try to get some more consistent representation of strings for ZIP codes
Full Changelog: 1.1.0...1.1.1
v. 1.1.0 - ICE Detention Facilities Data Scraper and Enricher
We have a new ICE Detention Facilities Data Scraper and Enricher release for you. Since the v1.0.0 release a great deal of enhancements added. It's a Python script managed by the Open Security Mapping Project.
In short this will help identify the online profile of each ICE detention facility, and a lot of additional data about each facility. Please see the project home page for more about mapping these facilities and other detailed info sources.
This script scrapes ICE detention facility data from ICE.gov and enriches it with information from Wikipedia, Wikidata, and
OpenStreetMap.
The original purpose is to identify if the detention facilities have data on Wikipedia, Wikidata and OpenStreetMap, which will help with documenting the facilities appropriately. As these entries get fixed up, you should be able to see your search results change almost immediately.
Other items since v1.0.0 :
- XLSX export! JSON export! parquet export!
- Lots of hand tuned and matched detention facilities. Dealing with spreadsheets sources as they get changed and reformatted by ICE
- Generating statistics on detention facilities.
- Multi Threaded enrichment of detention sites. (Note if it gets stuck on Ctrl-C you may have to kill the threads using ps or close terminal window.)
- OpenStreetMap longitude fix and other improvements.
- Guantanamo Bay detention data.
- Stemming for searches in Wikipedia.
- Vera.org (not by default but available) and other data sources added and enhanced.
- 287(g) state and local agencies data is gathered.
- Facilities inspection data checks including last inspections date, PDF extraction.
- Detailed field office contact information.
The credit for these excellent features largely due to volunteer @johnseekins sorting everything out.
Important environment notes
There have been updates to mise and uv since previous releases. It can be pesky to deal with the environment settings. It may be needed to install mise fresh if you get errors like ReleaseError: No asset found for target: mise-v2026.2.11-linux-x64.tar.gz
curl https://mise.run | sh
in a new window:
mise install uv@latest
mise use uv
And in a new window:
mise install --verbose
What's Changed
- Improve project structure some by @johnseekins in #14
- Bump ruff from 0.12.12 to 0.13.0 by @dependabot[bot] in #21
- Bump polars from 1.33.0 to 1.33.1 by @dependabot[bot] in #20
- fix default data so --load-existing works without enrich by @johnseekins in #22
- update sheet from ice.gov (as it gets updated weekly) by @johnseekins in #25
- shrink default data set to a random subset of all facilities by @johnseekins in #26
- find pages to scrape rather than hard-coding by @johnseekins in #28
- Bump mypy from 1.17.1 to 1.18.1 by @dependabot[bot] in #30
- Bump types-requests from 2.32.4.20250809 to 2.32.4.20250913 by @dependabot[bot] in #29
- more matching fixes by @johnseekins in #33
- support xlsx writing by @johnseekins in #23
- improve format of enrichment data by @johnseekins in #31
- collect and match all field offices by @johnseekins in #32
- Faster enrichment step by @johnseekins in #34
- Scraper function break up by @johnseekins in #38
- Some additional statistics in the data model by @johnseekins in #41
- Bump ruff from 0.13.0 to 0.13.1 by @dependabot[bot] in #40
- Bump mypy from 1.18.1 to 1.18.2 by @dependabot[bot] in #39
- Bump lxml from 6.0.1 to 6.0.2 by @dependabot[bot] in #45
- Bump fastexcel from 0.15.1 to 0.16.0 by @dependabot[bot] in #44
- Bump actions/checkout from 4 to 5 in the actions group by @dependabot[bot] in #43
- Add custom facilities and some small schema improvements by @johnseekins in #48
- match two JTF records by @johnseekins in #53
- Bump beautifulsoup4 from 4.13.5 to 4.14.0 by @dependabot[bot] in #51
- Bump ruff from 0.13.1 to 0.13.2 by @dependabot[bot] in #52
- Vera.org Facility data and additional facility types/groupings by @johnseekins in #49
- small fixes and additional name matching (and dep updates) by @johnseekins in #58
- Bump ruff from 0.13.2 to 0.14.1 by @dependabot[bot] in #59
- Bump beautifulsoup4 from 4.14.0 to 4.14.2 by @dependabot[bot] in #56
- Bump polars from 1.33.1 to 1.34.0 by @dependabot[bot] in #54
- Improve local linting and a few bug fixes by @johnseekins in #66
- start adding participating agency data by @johnseekins in #68
- Bump ruff from 0.14.1 to 0.14.4 by @dependabot[bot] in #72
- Bump polars from 1.34.0 to 1.35.2 by @dependabot[bot] in #71
- OSM way fix by @johnseekins in #77
- Bump mypy from 1.19.0 to 1.19.1 by @dependabot[bot] in #90
- Bump pyarrow from 22.0.0 to 23.0.0 by @dependabot[bot] in #96
- Bump ruff from 0.14.7 to 0.14.13 by @dependabot[bot] in #95
- Bump polars from 1.35.2 to 1.37.1 by @dependabot[bot] in #94
- Bump types-requests from 2.32.4.20250913 to 2.32.4.20260107 by @dependabot[bot] in #93
- add very basic inspections collection by @johnseekins in #83
Contributors
- @johnseekins made their first contribution in #14
- @dependabot[bot] made their first contribution in #21
Full Changelog: 1.0.0...1.1.0
1.1.0-alpha3
This latest release of the ICE detention scraper improves the OpenStreetMap parsing. This is for monitoring ICE detention facilities data, more detailed than any other program available. We are still evolving but this is more likely to give correct, or nearly correct, results for ICE detention facilities in OpenStreetMap. A few dependencies also updated! Thanks to @johnseekins for the OSM patching.
What's Changed
- OSM way fix by @johnseekins in #77
Full Changelog: 1.1.0-alpha2...1.1.0-alpha3
1.1.0-alpha2
The 1.1.0-alpha2 release brings a lot of improvements to the ice_detention_scraper for monitoring ICE detention facilities data, more detailed than any other program available.
Updates include: writing xlsx files, collecting the field offices, multi threading the enrichments, reorganizing the code, better statistics organizing, a ton of hand-matched records, new 287(g) agency data gathering, Guantanamo Bay data, optional scraping thousands of non-ICE-related facilities via Vera.org. The outputs are now placed in a more convenient output directory as well. The detention center code is now updated to track "FY26" headers so it will not parse the older "FY25" versions of the files (#74).
Please note we recommend using the program uv to manage your local python dependencies. Thanks to @johnseekins for doing almost all of the diligent work to bring you this release!
What's Changed
- support xlsx writing by @johnseekins in #23
- improve format of enrichment data by @johnseekins in #31
- collect and match all field offices by @johnseekins in #32
- Faster enrichment step by @johnseekins in #34
- Scraper function break up by @johnseekins in #38
- Some additional statistics in the data model by @johnseekins in #41
- Bump ruff from 0.13.0 to 0.13.1 by @dependabot[bot] in #40
- Bump mypy from 1.18.1 to 1.18.2 by @dependabot[bot] in #39
- Bump lxml from 6.0.1 to 6.0.2 by @dependabot[bot] in #45
- Bump fastexcel from 0.15.1 to 0.16.0 by @dependabot[bot] in #44
- Bump actions/checkout from 4 to 5 in the actions group by @dependabot[bot] in #43
- Add custom facilities and some small schema improvements by @johnseekins in #48
- match two JTF records by @johnseekins in #53
- Bump beautifulsoup4 from 4.13.5 to 4.14.0 by @dependabot[bot] in #51
- Bump ruff from 0.13.1 to 0.13.2 by @dependabot[bot] in #52
- Vera.org Facility data and additional facility types/groupings by @johnseekins in #49
- small fixes and additional name matching (and dep updates) by @johnseekins in #58
- Bump ruff from 0.13.2 to 0.14.1 by @dependabot[bot] in #59
- Bump beautifulsoup4 from 4.14.0 to 4.14.2 by @dependabot[bot] in #56
- Bump polars from 1.33.1 to 1.34.0 by @dependabot[bot] in #54
- Improve local linting and a few bug fixes by @johnseekins in #66
- start adding participating agency data (287g network) by @johnseekins in #68
- Bump ruff from 0.14.1 to 0.14.4 by @dependabot[bot] in #72
- Bump polars from 1.34.0 to 1.35.2 by @dependabot[bot] in #71
Full Changelog: 1.1.0-alpha1...1.1.0-alpha2
1.1.0-alpha1
Large rewrite of the ice_detention_scraper, many thanks to @johnseekins and a couple volunteers who tested the many new refinements and additions. This brings us up to more than 190 facilities, many of them not managed by ICE directly but still part of their network.
Generally this release should be okay for use. It has been tested in various modes.
The scraper can now obtain and present data including how many detainees are in the facilities. To address the poor and incorrect postal addresses provided by ICE there are a ton of manually matched addresses. The project now uses uv and mise to orchestrate Python containers. An additional ICE detention facility spreadsheet that is updated about 2x a month is now downloaded and parsed to get this more detailed facility data.
Export formats are now expanded! json is an option and more are in the works.
The --load-existing option is mainly intended for developers and has been trimmed to 20 facilities.
In the enrichment process there is much improved searching around the names and 'stemming' them to look for alternatives in OpenStreetMap and Wikipedia, which should make it easier to find.
For developers and overall maintainability, there is better type safety tools, git commit hooks for checking code quality, and dependabot integration. Please see the readme.md for additional details on all this.
If you want to help check out the issue queue and the pull requests.
Known issue
- the flags for
--debug-wikipedia,--debug-osm,--debug-wikidataare not currently working. They will just kick a notice at you. ( #19 )
What's Changed
- Improve project structure some by @johnseekins in #14
- Bump ruff from 0.12.12 to 0.13.0 by @dependabot[bot] in #21
- Bump polars from 1.33.0 to 1.33.1 by @dependabot[bot] in #20
- fix default data so --load-existing works without enrich by @johnseekins in #22
- update sheet from ice.gov (as it gets updated weekly) by @johnseekins in #25
- shrink default data set to a random subset of all facilities by @johnseekins in #26
- find pages to scrape rather than hard-coding by @johnseekins in #28
- Bump mypy from 1.17.1 to 1.18.1 by @dependabot[bot] in #30
- Bump types-requests from 2.32.4.20250809 to 2.32.4.20250913 by @dependabot[bot] in #29
- more matching fixes by @johnseekins in #33
New Contributors
- @johnseekins made their first contribution in #14
- @dependabot[bot] made their first contribution in #21
Full Changelog: 1.0.0...1.1.0-alpha1
v1.0.0
ICE Detention Facilities Scraper - Initial Release 1.0.0.
ICE Detention Facilities Data Scraper and Enricher, a Python script managed by the Open Security Mapping Project.
In short this will help identify the online profile of each ICE detention facility. Please see the project home page for more about mapping these facilities and other detailed info sources.
This script scrapes ICE detention facility data from ICE.gov and enriches it with information from Wikipedia, Wikidata, and
OpenStreetMap.
The main purpose right now is to identify if the detention facilities have data on Wikipedia, Wikidata and OpenStreetMap, which will help with documenting the facilities appropriately. As these entries get fixed up, you should be able to see your CSV results change almost immediately.
You can also use --load-existing to leverage an existing scrape of the data from ICE.gov. This is stored in data_loader.py and includes the official current addresses of facilities. (Note ICE has been renaming known "detention center" sites to "processing center", and so on.)
Please see the README.md for additional information.