Discover low-to-mid install Android apps with signal, not noise.
A production-minded Streamlit toolkit for hunting Google Play apps in specific install ranges, de-duplicating findings across sessions, and exporting clean research datasets in one click.
Important
The app is designed for market research workflows where you need repeatable discovery of apps with constrained install volumes (for example, 10,000+ to 100,000+ segments).
- Features
- Tech Stack
- Project Structure
- Key Design Decisions
- Getting Started
- Testing
- Deployment
- Usage
- Configuration
- License
- Contacts
- Support the Project
- Install-range filtering for both lower and upper bounds (
min_installs,max_installs). - Randomized multi-query discovery over a weighted keyword pool (niche + games + utilities).
- Multi-locale search strategy to surface apps from different language/country combinations.
- Deduplication pipeline backed by persistent SQLite state (
seen_apps) so you don’t re-review the same app every run. - Detail caching layer (
app_details_cache) to reduce repeated API calls and speed up iterative sessions. - Parallel metadata fetch using a thread pool for better throughput on detail lookups.
- Novelty scoring mode to rank results by “non-obviousness” instead of raw popularity.
- DataFrame-based result grid plus one-click CSV export for downstream BI/analysis.
Tip
If you’re in discovery mode, keep sort_by_novelty enabled and increase queries_per_run first before increasing search_limit.
- Language: Python 3.10+
- UI layer: Streamlit
- Data wrangling: pandas
- Data source:
google-play-scraper - Persistence: SQLite (
sqlite3from Python stdlib) - Concurrency:
concurrent.futures.ThreadPoolExecutor
.
├── app.py # Streamlit entrypoint and orchestration
├── ui.py # Sidebar controls + result rendering
├── scraper.py # Search, candidate collection, detail fetch pipeline
├── scoring.py # Novelty score math and sorting helper
├── db.py # SQLite schema + read/write/cache helpers
├── config.py # Constants, defaults, keyword pools, locales
├── requirements.txt
├── LICENSE
└── README.md
- Stateful dedup over stateless scraping
- A persistent
seen_appstable avoids rediscovery churn and keeps each run high signal.
- A persistent
- Cache-first detail fetch
- App details are cached with TTL semantics to cut API pressure and improve UX responsiveness.
- Weighted keyword sampling
- Niche terms are intentionally weighted to increase odds of finding less-saturated apps.
- Bounded parallelism
- Worker count scales with expected result volume to keep performance predictable.
- Post-fetch install filtering
- Install thresholds are enforced after details lookup because install metadata quality can vary at search stage.
Note
Internal display labels are currently Russian in the UI/data columns. This is expected behavior and does not affect export integrity.
Install the following locally:
- Python
3.10or newer - pip (bundled with most Python distributions)
- Optional but recommended:
venvfor isolated environmentsgitfor source sync
# 1) Clone repository
git clone https://github.com/<your-org>/App-Finder-100k.git
cd App-Finder-100k
# 2) Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3) Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# 4) Run the app
streamlit run app.pyWarning
First run may feel slower because caches are cold and the app is populating local SQLite state.
There is currently no dedicated automated test suite in this repository. Use the following quality gates locally:
# Sanity check Python syntax
python -m compileall .
# Optional: run with Streamlit and manually validate core flows
streamlit run app.pyManual validation checklist:
- Sidebar controls update search behavior correctly.
Найтиtriggers fresh candidate discovery.- Install range filter excludes out-of-band apps.
- CSV export downloads expected columns.
- “Очистить историю” resets dedup state.
For lightweight deployment, Streamlit Community Cloud or a small VM/container works well.
pip install -r requirements.txt
streamlit run app.py --server.port 8501 --server.address 0.0.0.0# Example runtime command (assuming image already built)
docker run --rm -p 8501:8501 -v $(pwd):/app -w /app python:3.11-slim \
bash -lc "pip install -r requirements.txt && streamlit run app.py --server.address 0.0.0.0"Caution
SQLite is file-based and great for single-instance deployment. For horizontal scaling, migrate state/cache tables to a centralized store.
# Start UI
streamlit run app.py
# In the sidebar:
# 1) Set min/max installs
# 2) Tune query depth and run count
# 3) Toggle novelty sorting
# 4) Click "Найти"
# 5) Export CSV when results look goodTypical workflow:
- Start with conservative filters (
min_installs=10_000,max_installs=200_000). - Run several iterations to enrich the dedup database.
- Enable novelty sort for “hidden gems” style prioritization.
- Export CSV and feed it into your analysis pipeline.
Project configuration lives in config.py.
Key knobs:
DEFAULT_MIN_INSTALLS,DEFAULT_MAX_INSTALLSDEFAULT_RESULTS_LIMIT,DEFAULT_SEARCH_DEPTH,DEFAULT_QUERIES_PER_RUNDEFAULT_WORKERSSEARCH_LOCALESGAME_KEYWORDS,NICHE_KEYWORDS,UTILITY_KEYWORDS,ALL_KEYWORDSCACHE_TTL_SEARCH,CACHE_TTL_DETAILSDB_PATH
No .env is required right now.
Note
If you need environment-driven config, the clean extension path is to introduce a small settings layer (e.g., pydantic-settings or os.getenv) and keep config.py as defaults.
This project is distributed under the GPL-3.0 license. See LICENSE for full legal terms.
Maintainer and channels are listed below.
If you find this tool useful, consider leaving a ⭐ on GitHub or supporting the author directly: