1- # Instagram Profile Scraper
1+ <p align =" center " >
2+ <img src =" assets/banner.svg " alt =" Instagram Data Scraper " width =" 900 " >
3+ </p >
4+
5+ <p align =" center " >
6+ <strong >Scrape Instagram profile URLs at scale — automated scrolling, smart filtering, clean CSV export.</strong >
7+ </p >
28
3- > ** Collect Instagram profile URLs at scale — automated scrolling, smart filtering, clean CSV export.**
4- > Open-source tool by ** [ SoClose Society] ( https://soclose.co ) ** — Digital solutions & software development studio.
9+ <p align =" center " >
10+ <a href =" LICENSE " ><img src =" https://img.shields.io/badge/License-MIT-575ECF?style=flat-square " alt =" License: MIT " ></a >
11+ <a href =" https://www.python.org/downloads/ " ><img src =" https://img.shields.io/badge/Python-3.10%2B-575ECF?style=flat-square&logo=python&logoColor=white " alt =" Python 3.10+ " ></a >
12+ <img src =" https://img.shields.io/badge/Platform-Windows%20%7C%20macOS%20%7C%20Linux-575ECF?style=flat-square " alt =" Platform " >
13+ <a href =" https://www.selenium.dev/ " ><img src =" https://img.shields.io/badge/Selenium-4.x-575ECF?style=flat-square&logo=selenium&logoColor=white " alt =" Selenium " ></a >
14+ <a href =" https://github.com/SoCloseSociety/InstagramDataScraper/stargazers " ><img src =" https://img.shields.io/github/stars/SoCloseSociety/InstagramDataScraper?style=flat-square&color=575ECF " alt =" GitHub Stars " ></a >
15+ <a href =" https://github.com/SoCloseSociety/InstagramDataScraper/issues " ><img src =" https://img.shields.io/github/issues/SoCloseSociety/InstagramDataScraper?style=flat-square&color=575ECF " alt =" Issues " ></a >
16+ <a href =" https://github.com/SoCloseSociety/InstagramDataScraper/network/members " ><img src =" https://img.shields.io/github/forks/SoCloseSociety/InstagramDataScraper?style=flat-square&color=575ECF " alt =" Forks " ></a >
17+ </p >
518
6- [ ![ MIT License] ( https://img.shields.io/badge/License-MIT-blue.svg )] ( LICENSE )
7- [ ![ Python 3.10+] ( https://img.shields.io/badge/Python-3.10%2B-yellow.svg )] ( https://python.org )
8- [ ![ Selenium] ( https://img.shields.io/badge/Selenium-4.x-green.svg )] ( https://selenium.dev )
9- [ ![ SoClose Society] ( https://img.shields.io/badge/SoClose-Society-purple.svg )] ( https://soclose.co )
19+ <p align =" center " >
20+ <a href =" #quick-start " >Quick Start</a > &bull ;
21+ <a href =" #key-features " >Features</a > &bull ;
22+ <a href =" #configuration " >Configuration</a > &bull ;
23+ <a href =" #faq " >FAQ</a > &bull ;
24+ <a href =" #contributing " >Contributing</a >
25+ </p >
1026
1127---
1228
13- ## Why This Tool ?
29+ ## What is Instagram Data Scraper ?
1430
15- Need to build a prospect list, analyze followers, or study engagement patterns? Manually copying Instagram profiles is slow and tedious. This scraper automates the entire process — login, scroll, extract, deduplicate, export — in one command .
31+ ** Instagram Data Scraper ** is a free, open-source ** Instagram profile extraction tool ** built with Python and Selenium. It automates the collection of Instagram profile URLs from any page — feed, hashtag, explore, followers — with smart filtering and deduplication .
1632
17- ** Built for:**
18- - Growth hackers & digital marketers building lead lists
19- - Data analysts studying social media patterns
20- - Researchers collecting public profile datasets
21- - Developers learning Selenium browser automation
33+ Need to build a prospect list, analyze followers, or study engagement patterns? Manually copying profiles is slow and tedious. This scraper handles login, scrolling, extraction, deduplication, and CSV export in one command.
2234
23- ---
35+ ### Who is this for?
2436
25- ## Features
26-
27- | Feature | Description |
28- | ---| ---|
29- | ** One-command setup** | Clone, install, run — scraping in under 2 minutes |
30- | ** Smart login** | Automated Instagram authentication via Selenium |
31- | ** Infinite scroll** | Continuous feed scrolling with auto-stop detection |
32- | ** Profile filtering** | Extracts only profile URLs, skips /explore/, /reels/, /settings/ etc. |
33- | ** Deduplication** | Built-in ` set() ` ensures zero duplicate profiles |
34- | ** Human-like delays** | Randomized scroll timing (0.8s–2.0s) to mimic real behavior |
35- | ** Auto-save** | Progress saved every 50 iterations — never lose data |
36- | ** Graceful stop** | Press ` Ctrl+C ` anytime — all collected data is saved |
37- | ** Secure credentials** | ` .env ` file support — credentials never in code |
38- | ** Clean CSV output** | Full Instagram URLs, sorted alphabetically, UTF-8 encoded |
39- | ** Detailed logging** | Real-time progress with iteration count and stale detection |
37+ - ** Growth Hackers** building lead lists for outreach campaigns
38+ - ** Digital Marketers** studying competitors' follower bases
39+ - ** Data Analysts** collecting social media datasets
40+ - ** Researchers** studying engagement patterns and influencer networks
41+ - ** Startup Founders** identifying potential customers or partners
42+ - ** Developers** learning Selenium browser automation
43+
44+ ### Key Features
45+
46+ - ** One-Command Setup** - Clone, install, run — scraping in under 2 minutes
47+ - ** Smart Login** - Automated Instagram authentication via Selenium
48+ - ** Infinite Scroll** - Continuous feed scrolling with auto-stop detection
49+ - ** Profile Filtering** - Extracts only profile URLs, skips /explore/, /reels/, /settings/
50+ - ** Deduplication** - Built-in set() ensures zero duplicate profiles
51+ - ** Human-Like Delays** - Randomized scroll timing (0.8s-2.0s) to mimic real behavior
52+ - ** Auto-Save** - Progress saved every 50 iterations — never lose data
53+ - ** Graceful Stop** - Press Ctrl+C anytime — all collected data is saved
54+ - ** Secure Credentials** - .env file support — credentials never in code
55+ - ** Clean CSV Output** - Full Instagram URLs, sorted alphabetically, UTF-8 encoded
56+ - ** Free & Open Source** - MIT license, no API key required
4057
4158---
4259
4360## Quick Start
4461
4562### Prerequisites
4663
47- - ** Python 3.10+** — [ Download] ( https://python.org/downloads/ )
48- - ** Google Chrome** — Latest stable version
49- - ** Git** — [ Download] ( https://git-scm.com/ )
64+ | Requirement | Details |
65+ | -------------| ---------|
66+ | ** Python** | Version 3.10 or higher ([ Download] ( https://www.python.org/downloads/ ) ) |
67+ | ** Google Chrome** | Latest version ([ Download] ( https://www.google.com/chrome/ ) ) |
68+ | ** Instagram Account** | A valid Instagram account |
5069
51- ### Install
70+ ### Installation
5271
5372``` bash
54- git clone https://github.com/soclosesociety/InstagramDataScraper.git
73+ # 1. Clone the repository
74+ git clone https://github.com/SoCloseSociety/InstagramDataScraper.git
5575cd InstagramDataScraper
76+
77+ # 2. (Recommended) Create a virtual environment
5678python -m venv venv
57- source venv/bin/activate # macOS/Linux
58- # venv\Scripts\activate # Windows
79+
80+ # Activate it:
81+ # Windows:
82+ venv\S cripts\a ctivate
83+ # macOS / Linux:
84+ source venv/bin/activate
85+
86+ # 3. Install dependencies
5987pip install -r requirements.txt
6088```
6189
62- ### Configure
90+ ### Configure Credentials
6391
6492``` bash
6593cp .env.example .env
@@ -74,7 +102,7 @@ INSTA_PASSWORD=your_password
74102
75103> Skip the ` .env ` file to enter credentials at runtime instead.
76104
77- ### Run
105+ ### Usage
78106
79107``` bash
80108python main.py
@@ -91,17 +119,6 @@ Press **Ctrl+C** at any time to stop and save.
91119
92120---
93121
94- ## Output Format
95-
96- ``` csv
97- ProfileLink
98- https://www.instagram.com/alice/
99- https://www.instagram.com/bob/
100- https://www.instagram.com/charlie/
101- ```
102-
103- ---
104-
105122## How It Works
106123
107124```
@@ -116,26 +133,43 @@ https://www.instagram.com/charlie/
116133 └─────────────┘
117134```
118135
119- 1 . ** Selenium** opens Chrome and handles authentication
120- 2 . ** BeautifulSoup** parses the page HTML and extracts ` <a> ` tags
121- 3 . Profile URLs are filtered (only ` /username/ ` patterns, no ` /explore/ ` etc.)
122- 4 . A ` set ` ensures each profile appears only once
123- 5 . Randomized delays between scrolls avoid detection
124- 6 . Auto-stops after 500 stale iterations (no new profiles found)
136+ ---
137+
138+ ## Output Format
139+
140+ ``` csv
141+ ProfileLink
142+ https://www.instagram.com/alice/
143+ https://www.instagram.com/bob/
144+ https://www.instagram.com/charlie/
145+ ```
125146
126147---
127148
128149## Configuration
129150
130- Edit the constants at the top of [ main.py] ( main.py ) :
151+ Edit the constants at the top of ` main.py ` :
131152
132153| Variable | Default | Description |
133- | ---| ---| ---|
134- | ` MAX_STALE_ITERATIONS ` | 500 | Stop after N iterations with no new links |
135- | ` SCROLL_PAUSE_MIN ` | 0.8s | Minimum delay between scrolls |
136- | ` SCROLL_PAUSE_MAX ` | 2.0s | Maximum delay between scrolls |
137- | ` SCROLL_AMOUNT ` | 600 | Pixels to scroll down per iteration |
138- | ` SAVE_INTERVAL ` | 50 | Save to CSV every N iterations |
154+ | ----------| ---------| -------------|
155+ | ` MAX_STALE_ITERATIONS ` | ` 500 ` | Stop after N iterations with no new links |
156+ | ` SCROLL_PAUSE_MIN ` | ` 0.8s ` | Minimum delay between scrolls |
157+ | ` SCROLL_PAUSE_MAX ` | ` 2.0s ` | Maximum delay between scrolls |
158+ | ` SCROLL_AMOUNT ` | ` 600 ` | Pixels to scroll down per iteration |
159+ | ` SAVE_INTERVAL ` | ` 50 ` | Save to CSV every N iterations |
160+
161+ ---
162+
163+ ## Tech Stack
164+
165+ | Technology | Purpose |
166+ | ------------| ---------|
167+ | [ Python 3.10+] ( https://python.org ) | Core language |
168+ | [ Selenium 4.x] ( https://selenium.dev ) | Browser automation |
169+ | [ BeautifulSoup4] ( https://beautiful-soup-4.readthedocs.io ) | HTML parsing |
170+ | [ lxml] ( https://lxml.de ) | Fast HTML parser backend |
171+ | [ python-dotenv] ( https://pypi.org/project/python-dotenv/ ) | Environment variable management |
172+ | [ webdriver-manager] ( https://pypi.org/project/webdriver-manager/ ) | Automatic ChromeDriver setup |
139173
140174---
141175
@@ -146,79 +180,109 @@ InstagramDataScraper/
146180├── main.py # Core scraper script
147181├── requirements.txt # Python dependencies
148182├── .env.example # Credential template
149- ├── .gitignore # Git ignore rules
183+ ├── assets/
184+ │ └── banner.svg # Project banner
185+ ├── pyproject.toml # Python project metadata
150186├── CONTRIBUTING.md # Contribution guidelines
151187├── LICENSE # MIT License
152- ├── pyproject.toml # Python project metadata
153- └── README.md # Documentation
188+ ├── README.md # This file
189+ └── .gitignore # Git ignore rules
154190```
155191
156192---
157193
158- ## Tech Stack
194+ ## Troubleshooting
159195
160- | Technology | Purpose |
161- | ---| ---|
162- | [ Python 3.10+] ( https://python.org ) | Core language |
163- | [ Selenium 4.x] ( https://selenium.dev ) | Browser automation |
164- | [ BeautifulSoup4] ( https://beautiful-soup-4.readthedocs.io ) | HTML parsing |
165- | [ lxml] ( https://lxml.de ) | Fast HTML parser backend |
166- | [ python-dotenv] ( https://pypi.org/project/python-dotenv/ ) | Environment variable management |
167- | [ webdriver-manager] ( https://pypi.org/project/webdriver-manager/ ) | Automatic ChromeDriver setup |
196+ ### Chrome driver issues
197+
198+ ``` bash
199+ pip install --upgrade webdriver-manager
200+ ```
201+
202+ ### Login fails
203+
204+ If the automated login doesn't work:
205+ 1 . Check your credentials in ` .env `
206+ 2 . Instagram may require 2FA — complete it manually in the browser window
207+ 3 . Try logging in manually first, then press ENTER to start scraping
208+
209+ ### No profiles found
210+
211+ If the scraper scrolls but doesn't find profiles:
212+ 1 . Make sure you navigated to a page with profile links (feed, hashtag page, followers list)
213+ 2 . Instagram may have changed its HTML structure — open an issue
168214
169215---
170216
171- ## More Open-Source Tools by SoClose Society
217+ ## FAQ
218+
219+ ** Q: Is this free?**
220+ A: Yes. Instagram Data Scraper is 100% free and open source under the MIT license.
221+
222+ ** Q: Do I need an Instagram API key?**
223+ A: No. This tool uses browser automation (Selenium), no API key needed.
224+
225+ ** Q: How many profiles can I scrape?**
226+ A: No hard limit. The scraper runs until no new profiles are found for 500 consecutive iterations. Be mindful of Instagram's usage policies.
172227
173- We build and share automation tools for the community. Explore our other projects:
228+ ** Q: Are my credentials safe?**
229+ A: Credentials are stored in a local ` .env ` file that is gitignored. They are never uploaded or shared.
174230
175- | Project | Description | Stars |
176- | ---| ---| ---|
177- | [ PinterestBulkPostBot] ( https://github.com/soclosesociety/PinterestBulkPostBot ) | Automated Pinterest posting tool | 11 |
178- | [ LinkedinDataScraper] ( https://github.com/soclosesociety/LinkedinDataScraper ) | LinkedIn contact data extraction | 2 |
179- | [ BOT_GoogleMap_Scrapping] ( https://github.com/soclosesociety/BOT_GoogleMap_Scrapping ) | Google Maps data scraper | 3 |
180- | [ BOT-Facebook_Bulk_Invite] ( https://github.com/soclosesociety/BOT-Facebook_Bulk_Invite_Friend_To_FB_Group ) | Facebook group invitation automation | 4 |
181- | [ FreeWorkDataScraper] ( https://github.com/soclosesociety/FreeWorkDataScraper ) | Freelance job posting scraper | 1 |
231+ ** Q: Can I scrape hashtag pages?**
232+ A: Yes. After login, navigate to any hashtag page, press ENTER, and the scraper will collect profile links.
182233
183- ** [ View all 15+ repositories] ( https://github.com/soclosesociety ) **
234+ ** Q: Does it work on Mac / Linux?**
235+ A: Yes. Fully cross-platform on Windows, macOS, and Linux.
184236
185237---
186238
187- ## Disclaimer
239+ ## Alternatives Comparison
188240
189- > This tool is provided ** for educational and research purposes only** .
190- > Scraping Instagram may violate their [ Terms of Service] ( https://help.instagram.com/581066165581870 ) .
191- > The authors are not responsible for any misuse or consequences resulting from the use of this tool.
192- > Always respect platform policies and applicable laws in your jurisdiction.
241+ | Feature | Instagram Data Scraper | Manual Copy-Paste | Instagram API | Paid Tools |
242+ | ---------| ----------------------| -------------------| --------------| -----------|
243+ | Price | ** Free** | Free | Free (limited) | $30-100/mo |
244+ | Bulk extraction | Yes | No | Rate limited | Yes |
245+ | Profile filtering | Yes | Manual | N/A | Varies |
246+ | Open source | Yes | N/A | No | No |
247+ | API key required | No | No | Yes | Yes |
248+ | Cross-platform | Yes | Yes | Any | Web only |
193249
194250---
195251
196252## Contributing
197253
198- We welcome contributions! See [ CONTRIBUTING.md ] ( CONTRIBUTING.md ) for guidelines .
254+ Contributions are welcome! Please read the [ Contributing Guide ] ( CONTRIBUTING.md ) before submitting a pull request .
199255
200256---
201257
202- ## About SoClose Society
203-
204- ** [ SoClose Society] ( https://soclose.co ) ** is a digital solutions & software development studio. We build open-source automation tools and share them with the developer community.
258+ ## License
205259
206- - ** Website:** [ soclose.co] ( https://soclose.co )
207- - ** GitHub:** [ github.com/soclosesociety] ( https://github.com/soclosesociety )
208- - ** Contact:** [ contact@soclose.co ] ( mailto:contact@soclose.co )
209- - ** LinkedIn:** [ SoClose Agency] ( https://linkedin.com/company/soclose-agency )
210- - ** Twitter/X:** [ @SoCloseAgency ] ( https://twitter.com/SoCloseAgency )
260+ This project is licensed under the [ MIT License] ( LICENSE ) .
211261
212262---
213263
214- ## License
264+ ## Disclaimer
215265
216- This project is licensed under the ** MIT License ** — see the [ LICENSE ] ( LICENSE ) file for details .
266+ This tool is provided for ** educational and research purposes only ** . Scraping Instagram may violate their [ Terms of Service ] ( https://help.instagram.com/581066165581870 ) . The authors are not responsible for any misuse or consequences resulting from the use of this software. Always respect platform policies and applicable laws .
217267
218268---
219269
220270<p align =" center " >
221- <strong ><a href =" https://soclose.co " >SoClose Society</a ></strong ><br />
222- Digital solutions & software development studio<br /><br />
223- <a href =" https://github.com/soclosesociety/InstagramDataScraper " >Star this repo</a > if you find it useful!
271+ <strong >If this project helps you, please give it a star!</strong ><br >
272+ It helps others discover this tool.<br ><br >
273+ <a href =" https://github.com/SoCloseSociety/InstagramDataScraper " >
274+ <img src="https://img.shields.io/github/stars/SoCloseSociety/InstagramDataScraper?style=for-the-badge&logo=github&color=575ECF" alt="Star this repo">
275+ </a >
276+ </p >
277+
278+ <br >
279+
280+ <p align =" center " >
281+ <sub >Built with purpose by <a href =" https://soclose.co " ><strong >SoClose</strong ></a > &mdash ; Digital Innovation Through Automation & AI</sub ><br >
282+ <sub >
283+ <a href="https://soclose.co">Website</a> •
284+ <a href="https://linkedin.com/company/soclose-agency">LinkedIn</a> •
285+ <a href="https://twitter.com/SoCloseAgency">Twitter</a> •
286+ <a href="mailto:hello@soclose.co">Contact</a>
287+ </sub >
224288</p >
0 commit comments