Skip to content

Commit d6722c6

Browse files
author
N30
committed
Rebrand README and add project banner
Apply SoClose design system: custom SVG banner, flat-square badges, standardized marketing structure with FAQ, alternatives comparison, and branded footer.
1 parent 65a8027 commit d6722c6

2 files changed

Lines changed: 237 additions & 104 deletions

File tree

README.md

Lines changed: 168 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,93 @@
1-
# Instagram Profile Scraper
1+
<p align="center">
2+
<img src="assets/banner.svg" alt="Instagram Data Scraper" width="900">
3+
</p>
4+
5+
<p align="center">
6+
<strong>Scrape Instagram profile URLs at scale — automated scrolling, smart filtering, clean CSV export.</strong>
7+
</p>
28

3-
> **Collect Instagram profile URLs at scale — automated scrolling, smart filtering, clean CSV export.**
4-
> Open-source tool by **[SoClose Society](https://soclose.co)** — Digital solutions & software development studio.
9+
<p align="center">
10+
<a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-575ECF?style=flat-square" alt="License: MIT"></a>
11+
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/Python-3.10%2B-575ECF?style=flat-square&logo=python&logoColor=white" alt="Python 3.10+"></a>
12+
<img src="https://img.shields.io/badge/Platform-Windows%20%7C%20macOS%20%7C%20Linux-575ECF?style=flat-square" alt="Platform">
13+
<a href="https://www.selenium.dev/"><img src="https://img.shields.io/badge/Selenium-4.x-575ECF?style=flat-square&logo=selenium&logoColor=white" alt="Selenium"></a>
14+
<a href="https://github.com/SoCloseSociety/InstagramDataScraper/stargazers"><img src="https://img.shields.io/github/stars/SoCloseSociety/InstagramDataScraper?style=flat-square&color=575ECF" alt="GitHub Stars"></a>
15+
<a href="https://github.com/SoCloseSociety/InstagramDataScraper/issues"><img src="https://img.shields.io/github/issues/SoCloseSociety/InstagramDataScraper?style=flat-square&color=575ECF" alt="Issues"></a>
16+
<a href="https://github.com/SoCloseSociety/InstagramDataScraper/network/members"><img src="https://img.shields.io/github/forks/SoCloseSociety/InstagramDataScraper?style=flat-square&color=575ECF" alt="Forks"></a>
17+
</p>
518

6-
[![MIT License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
7-
[![Python 3.10+](https://img.shields.io/badge/Python-3.10%2B-yellow.svg)](https://python.org)
8-
[![Selenium](https://img.shields.io/badge/Selenium-4.x-green.svg)](https://selenium.dev)
9-
[![SoClose Society](https://img.shields.io/badge/SoClose-Society-purple.svg)](https://soclose.co)
19+
<p align="center">
20+
<a href="#quick-start">Quick Start</a> &bull;
21+
<a href="#key-features">Features</a> &bull;
22+
<a href="#configuration">Configuration</a> &bull;
23+
<a href="#faq">FAQ</a> &bull;
24+
<a href="#contributing">Contributing</a>
25+
</p>
1026

1127
---
1228

13-
## Why This Tool?
29+
## What is Instagram Data Scraper?
1430

15-
Need to build a prospect list, analyze followers, or study engagement patterns? Manually copying Instagram profiles is slow and tedious. This scraper automates the entire process — login, scroll, extract, deduplicate, export — in one command.
31+
**Instagram Data Scraper** is a free, open-source **Instagram profile extraction tool** built with Python and Selenium. It automates the collection of Instagram profile URLs from any page — feed, hashtag, explore, followers — with smart filtering and deduplication.
1632

17-
**Built for:**
18-
- Growth hackers & digital marketers building lead lists
19-
- Data analysts studying social media patterns
20-
- Researchers collecting public profile datasets
21-
- Developers learning Selenium browser automation
33+
Need to build a prospect list, analyze followers, or study engagement patterns? Manually copying profiles is slow and tedious. This scraper handles login, scrolling, extraction, deduplication, and CSV export in one command.
2234

23-
---
35+
### Who is this for?
2436

25-
## Features
26-
27-
| Feature | Description |
28-
|---|---|
29-
| **One-command setup** | Clone, install, run — scraping in under 2 minutes |
30-
| **Smart login** | Automated Instagram authentication via Selenium |
31-
| **Infinite scroll** | Continuous feed scrolling with auto-stop detection |
32-
| **Profile filtering** | Extracts only profile URLs, skips /explore/, /reels/, /settings/ etc. |
33-
| **Deduplication** | Built-in `set()` ensures zero duplicate profiles |
34-
| **Human-like delays** | Randomized scroll timing (0.8s–2.0s) to mimic real behavior |
35-
| **Auto-save** | Progress saved every 50 iterations — never lose data |
36-
| **Graceful stop** | Press `Ctrl+C` anytime — all collected data is saved |
37-
| **Secure credentials** | `.env` file support — credentials never in code |
38-
| **Clean CSV output** | Full Instagram URLs, sorted alphabetically, UTF-8 encoded |
39-
| **Detailed logging** | Real-time progress with iteration count and stale detection |
37+
- **Growth Hackers** building lead lists for outreach campaigns
38+
- **Digital Marketers** studying competitors' follower bases
39+
- **Data Analysts** collecting social media datasets
40+
- **Researchers** studying engagement patterns and influencer networks
41+
- **Startup Founders** identifying potential customers or partners
42+
- **Developers** learning Selenium browser automation
43+
44+
### Key Features
45+
46+
- **One-Command Setup** - Clone, install, run — scraping in under 2 minutes
47+
- **Smart Login** - Automated Instagram authentication via Selenium
48+
- **Infinite Scroll** - Continuous feed scrolling with auto-stop detection
49+
- **Profile Filtering** - Extracts only profile URLs, skips /explore/, /reels/, /settings/
50+
- **Deduplication** - Built-in set() ensures zero duplicate profiles
51+
- **Human-Like Delays** - Randomized scroll timing (0.8s-2.0s) to mimic real behavior
52+
- **Auto-Save** - Progress saved every 50 iterations — never lose data
53+
- **Graceful Stop** - Press Ctrl+C anytime — all collected data is saved
54+
- **Secure Credentials** - .env file support — credentials never in code
55+
- **Clean CSV Output** - Full Instagram URLs, sorted alphabetically, UTF-8 encoded
56+
- **Free & Open Source** - MIT license, no API key required
4057

4158
---
4259

4360
## Quick Start
4461

4562
### Prerequisites
4663

47-
- **Python 3.10+**[Download](https://python.org/downloads/)
48-
- **Google Chrome** — Latest stable version
49-
- **Git**[Download](https://git-scm.com/)
64+
| Requirement | Details |
65+
|-------------|---------|
66+
| **Python** | Version 3.10 or higher ([Download](https://www.python.org/downloads/)) |
67+
| **Google Chrome** | Latest version ([Download](https://www.google.com/chrome/)) |
68+
| **Instagram Account** | A valid Instagram account |
5069

51-
### Install
70+
### Installation
5271

5372
```bash
54-
git clone https://github.com/soclosesociety/InstagramDataScraper.git
73+
# 1. Clone the repository
74+
git clone https://github.com/SoCloseSociety/InstagramDataScraper.git
5575
cd InstagramDataScraper
76+
77+
# 2. (Recommended) Create a virtual environment
5678
python -m venv venv
57-
source venv/bin/activate # macOS/Linux
58-
# venv\Scripts\activate # Windows
79+
80+
# Activate it:
81+
# Windows:
82+
venv\Scripts\activate
83+
# macOS / Linux:
84+
source venv/bin/activate
85+
86+
# 3. Install dependencies
5987
pip install -r requirements.txt
6088
```
6189

62-
### Configure
90+
### Configure Credentials
6391

6492
```bash
6593
cp .env.example .env
@@ -74,7 +102,7 @@ INSTA_PASSWORD=your_password
74102

75103
> Skip the `.env` file to enter credentials at runtime instead.
76104
77-
### Run
105+
### Usage
78106

79107
```bash
80108
python main.py
@@ -91,17 +119,6 @@ Press **Ctrl+C** at any time to stop and save.
91119

92120
---
93121

94-
## Output Format
95-
96-
```csv
97-
ProfileLink
98-
https://www.instagram.com/alice/
99-
https://www.instagram.com/bob/
100-
https://www.instagram.com/charlie/
101-
```
102-
103-
---
104-
105122
## How It Works
106123

107124
```
@@ -116,26 +133,43 @@ https://www.instagram.com/charlie/
116133
└─────────────┘
117134
```
118135

119-
1. **Selenium** opens Chrome and handles authentication
120-
2. **BeautifulSoup** parses the page HTML and extracts `<a>` tags
121-
3. Profile URLs are filtered (only `/username/` patterns, no `/explore/` etc.)
122-
4. A `set` ensures each profile appears only once
123-
5. Randomized delays between scrolls avoid detection
124-
6. Auto-stops after 500 stale iterations (no new profiles found)
136+
---
137+
138+
## Output Format
139+
140+
```csv
141+
ProfileLink
142+
https://www.instagram.com/alice/
143+
https://www.instagram.com/bob/
144+
https://www.instagram.com/charlie/
145+
```
125146

126147
---
127148

128149
## Configuration
129150

130-
Edit the constants at the top of [main.py](main.py):
151+
Edit the constants at the top of `main.py`:
131152

132153
| Variable | Default | Description |
133-
|---|---|---|
134-
| `MAX_STALE_ITERATIONS` | 500 | Stop after N iterations with no new links |
135-
| `SCROLL_PAUSE_MIN` | 0.8s | Minimum delay between scrolls |
136-
| `SCROLL_PAUSE_MAX` | 2.0s | Maximum delay between scrolls |
137-
| `SCROLL_AMOUNT` | 600 | Pixels to scroll down per iteration |
138-
| `SAVE_INTERVAL` | 50 | Save to CSV every N iterations |
154+
|----------|---------|-------------|
155+
| `MAX_STALE_ITERATIONS` | `500` | Stop after N iterations with no new links |
156+
| `SCROLL_PAUSE_MIN` | `0.8s` | Minimum delay between scrolls |
157+
| `SCROLL_PAUSE_MAX` | `2.0s` | Maximum delay between scrolls |
158+
| `SCROLL_AMOUNT` | `600` | Pixels to scroll down per iteration |
159+
| `SAVE_INTERVAL` | `50` | Save to CSV every N iterations |
160+
161+
---
162+
163+
## Tech Stack
164+
165+
| Technology | Purpose |
166+
|------------|---------|
167+
| [Python 3.10+](https://python.org) | Core language |
168+
| [Selenium 4.x](https://selenium.dev) | Browser automation |
169+
| [BeautifulSoup4](https://beautiful-soup-4.readthedocs.io) | HTML parsing |
170+
| [lxml](https://lxml.de) | Fast HTML parser backend |
171+
| [python-dotenv](https://pypi.org/project/python-dotenv/) | Environment variable management |
172+
| [webdriver-manager](https://pypi.org/project/webdriver-manager/) | Automatic ChromeDriver setup |
139173

140174
---
141175

@@ -146,79 +180,109 @@ InstagramDataScraper/
146180
├── main.py # Core scraper script
147181
├── requirements.txt # Python dependencies
148182
├── .env.example # Credential template
149-
├── .gitignore # Git ignore rules
183+
├── assets/
184+
│ └── banner.svg # Project banner
185+
├── pyproject.toml # Python project metadata
150186
├── CONTRIBUTING.md # Contribution guidelines
151187
├── LICENSE # MIT License
152-
├── pyproject.toml # Python project metadata
153-
└── README.md # Documentation
188+
├── README.md # This file
189+
└── .gitignore # Git ignore rules
154190
```
155191

156192
---
157193

158-
## Tech Stack
194+
## Troubleshooting
159195

160-
| Technology | Purpose |
161-
|---|---|
162-
| [Python 3.10+](https://python.org) | Core language |
163-
| [Selenium 4.x](https://selenium.dev) | Browser automation |
164-
| [BeautifulSoup4](https://beautiful-soup-4.readthedocs.io) | HTML parsing |
165-
| [lxml](https://lxml.de) | Fast HTML parser backend |
166-
| [python-dotenv](https://pypi.org/project/python-dotenv/) | Environment variable management |
167-
| [webdriver-manager](https://pypi.org/project/webdriver-manager/) | Automatic ChromeDriver setup |
196+
### Chrome driver issues
197+
198+
```bash
199+
pip install --upgrade webdriver-manager
200+
```
201+
202+
### Login fails
203+
204+
If the automated login doesn't work:
205+
1. Check your credentials in `.env`
206+
2. Instagram may require 2FA — complete it manually in the browser window
207+
3. Try logging in manually first, then press ENTER to start scraping
208+
209+
### No profiles found
210+
211+
If the scraper scrolls but doesn't find profiles:
212+
1. Make sure you navigated to a page with profile links (feed, hashtag page, followers list)
213+
2. Instagram may have changed its HTML structure — open an issue
168214

169215
---
170216

171-
## More Open-Source Tools by SoClose Society
217+
## FAQ
218+
219+
**Q: Is this free?**
220+
A: Yes. Instagram Data Scraper is 100% free and open source under the MIT license.
221+
222+
**Q: Do I need an Instagram API key?**
223+
A: No. This tool uses browser automation (Selenium), no API key needed.
224+
225+
**Q: How many profiles can I scrape?**
226+
A: No hard limit. The scraper runs until no new profiles are found for 500 consecutive iterations. Be mindful of Instagram's usage policies.
172227

173-
We build and share automation tools for the community. Explore our other projects:
228+
**Q: Are my credentials safe?**
229+
A: Credentials are stored in a local `.env` file that is gitignored. They are never uploaded or shared.
174230

175-
| Project | Description | Stars |
176-
|---|---|---|
177-
| [PinterestBulkPostBot](https://github.com/soclosesociety/PinterestBulkPostBot) | Automated Pinterest posting tool | 11 |
178-
| [LinkedinDataScraper](https://github.com/soclosesociety/LinkedinDataScraper) | LinkedIn contact data extraction | 2 |
179-
| [BOT_GoogleMap_Scrapping](https://github.com/soclosesociety/BOT_GoogleMap_Scrapping) | Google Maps data scraper | 3 |
180-
| [BOT-Facebook_Bulk_Invite](https://github.com/soclosesociety/BOT-Facebook_Bulk_Invite_Friend_To_FB_Group) | Facebook group invitation automation | 4 |
181-
| [FreeWorkDataScraper](https://github.com/soclosesociety/FreeWorkDataScraper) | Freelance job posting scraper | 1 |
231+
**Q: Can I scrape hashtag pages?**
232+
A: Yes. After login, navigate to any hashtag page, press ENTER, and the scraper will collect profile links.
182233

183-
**[View all 15+ repositories](https://github.com/soclosesociety)**
234+
**Q: Does it work on Mac / Linux?**
235+
A: Yes. Fully cross-platform on Windows, macOS, and Linux.
184236

185237
---
186238

187-
## Disclaimer
239+
## Alternatives Comparison
188240

189-
> This tool is provided **for educational and research purposes only**.
190-
> Scraping Instagram may violate their [Terms of Service](https://help.instagram.com/581066165581870).
191-
> The authors are not responsible for any misuse or consequences resulting from the use of this tool.
192-
> Always respect platform policies and applicable laws in your jurisdiction.
241+
| Feature | Instagram Data Scraper | Manual Copy-Paste | Instagram API | Paid Tools |
242+
|---------|----------------------|-------------------|--------------|-----------|
243+
| Price | **Free** | Free | Free (limited) | $30-100/mo |
244+
| Bulk extraction | Yes | No | Rate limited | Yes |
245+
| Profile filtering | Yes | Manual | N/A | Varies |
246+
| Open source | Yes | N/A | No | No |
247+
| API key required | No | No | Yes | Yes |
248+
| Cross-platform | Yes | Yes | Any | Web only |
193249

194250
---
195251

196252
## Contributing
197253

198-
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
254+
Contributions are welcome! Please read the [Contributing Guide](CONTRIBUTING.md) before submitting a pull request.
199255

200256
---
201257

202-
## About SoClose Society
203-
204-
**[SoClose Society](https://soclose.co)** is a digital solutions & software development studio. We build open-source automation tools and share them with the developer community.
258+
## License
205259

206-
- **Website:** [soclose.co](https://soclose.co)
207-
- **GitHub:** [github.com/soclosesociety](https://github.com/soclosesociety)
208-
- **Contact:** [contact@soclose.co](mailto:contact@soclose.co)
209-
- **LinkedIn:** [SoClose Agency](https://linkedin.com/company/soclose-agency)
210-
- **Twitter/X:** [@SoCloseAgency](https://twitter.com/SoCloseAgency)
260+
This project is licensed under the [MIT License](LICENSE).
211261

212262
---
213263

214-
## License
264+
## Disclaimer
215265

216-
This project is licensed under the **MIT License** — see the [LICENSE](LICENSE) file for details.
266+
This tool is provided for **educational and research purposes only**. Scraping Instagram may violate their [Terms of Service](https://help.instagram.com/581066165581870). The authors are not responsible for any misuse or consequences resulting from the use of this software. Always respect platform policies and applicable laws.
217267

218268
---
219269

220270
<p align="center">
221-
<strong><a href="https://soclose.co">SoClose Society</a></strong><br/>
222-
Digital solutions & software development studio<br/><br/>
223-
<a href="https://github.com/soclosesociety/InstagramDataScraper">Star this repo</a> if you find it useful!
271+
<strong>If this project helps you, please give it a star!</strong><br>
272+
It helps others discover this tool.<br><br>
273+
<a href="https://github.com/SoCloseSociety/InstagramDataScraper">
274+
<img src="https://img.shields.io/github/stars/SoCloseSociety/InstagramDataScraper?style=for-the-badge&logo=github&color=575ECF" alt="Star this repo">
275+
</a>
276+
</p>
277+
278+
<br>
279+
280+
<p align="center">
281+
<sub>Built with purpose by <a href="https://soclose.co"><strong>SoClose</strong></a> &mdash; Digital Innovation Through Automation & AI</sub><br>
282+
<sub>
283+
<a href="https://soclose.co">Website</a> &bull;
284+
<a href="https://linkedin.com/company/soclose-agency">LinkedIn</a> &bull;
285+
<a href="https://twitter.com/SoCloseAgency">Twitter</a> &bull;
286+
<a href="mailto:hello@soclose.co">Contact</a>
287+
</sub>
224288
</p>

0 commit comments

Comments
 (0)