App Finder 100k

Discover low-to-mid install Android apps with signal, not noise.

A production-minded Streamlit toolkit for hunting Google Play apps in specific install ranges, de-duplicating findings across sessions, and exporting clean research datasets in one click.

Important

The app is designed for market research workflows where you need repeatable discovery of apps with constrained install volumes (for example, 10,000+ to 100,000+ segments).

Features

Install-range filtering for both lower and upper bounds (min_installs, max_installs).
Randomized multi-query discovery over a weighted keyword pool (niche + games + utilities).
Multi-locale search strategy to surface apps from different language/country combinations.
Deduplication pipeline backed by persistent SQLite state (seen_apps) so you don’t re-review the same app every run.
Detail caching layer (app_details_cache) to reduce repeated API calls and speed up iterative sessions.
Parallel metadata fetch using a thread pool for better throughput on detail lookups.
Novelty scoring mode to rank results by “non-obviousness” instead of raw popularity.
DataFrame-based result grid plus one-click CSV export for downstream BI/analysis.

Tip

If you’re in discovery mode, keep sort_by_novelty enabled and increase queries_per_run first before increasing search_limit.

Tech Stack

Language: Python 3.10+
UI layer: Streamlit
Data wrangling: pandas
Data source: google-play-scraper
Persistence: SQLite (sqlite3 from Python stdlib)
Concurrency: concurrent.futures.ThreadPoolExecutor

Project Structure

.
├── app.py          # Streamlit entrypoint and orchestration
├── ui.py           # Sidebar controls + result rendering
├── scraper.py      # Search, candidate collection, detail fetch pipeline
├── scoring.py      # Novelty score math and sorting helper
├── db.py           # SQLite schema + read/write/cache helpers
├── config.py       # Constants, defaults, keyword pools, locales
├── requirements.txt
├── LICENSE
└── README.md

Key Design Decisions

Stateful dedup over stateless scraping
- A persistent seen_apps table avoids rediscovery churn and keeps each run high signal.
Cache-first detail fetch
- App details are cached with TTL semantics to cut API pressure and improve UX responsiveness.
Weighted keyword sampling
- Niche terms are intentionally weighted to increase odds of finding less-saturated apps.
Bounded parallelism
- Worker count scales with expected result volume to keep performance predictable.
Post-fetch install filtering
- Install thresholds are enforced after details lookup because install metadata quality can vary at search stage.

Note

Internal display labels are currently Russian in the UI/data columns. This is expected behavior and does not affect export integrity.

Getting Started

Prerequisites

Install the following locally:

Python 3.10 or newer
pip (bundled with most Python distributions)
Optional but recommended:
- venv for isolated environments
- git for source sync

Installation

# 1) Clone repository
git clone https://github.com/<your-org>/App-Finder-100k.git
cd App-Finder-100k

# 2) Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# 3) Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 4) Run the app
streamlit run app.py

Warning

First run may feel slower because caches are cold and the app is populating local SQLite state.

Testing

There is currently no dedicated automated test suite in this repository. Use the following quality gates locally:

# Sanity check Python syntax
python -m compileall .

# Optional: run with Streamlit and manually validate core flows
streamlit run app.py

Manual validation checklist:

Sidebar controls update search behavior correctly.
Найти triggers fresh candidate discovery.
Install range filter excludes out-of-band apps.
CSV export downloads expected columns.
“Очистить историю” resets dedup state.

Deployment

For lightweight deployment, Streamlit Community Cloud or a small VM/container works well.

Minimal production approach

pip install -r requirements.txt
streamlit run app.py --server.port 8501 --server.address 0.0.0.0

Docker-style deployment outline

# Example runtime command (assuming image already built)
docker run --rm -p 8501:8501 -v $(pwd):/app -w /app python:3.11-slim \
  bash -lc "pip install -r requirements.txt && streamlit run app.py --server.address 0.0.0.0"

Caution

SQLite is file-based and great for single-instance deployment. For horizontal scaling, migrate state/cache tables to a centralized store.

Usage

# Start UI
streamlit run app.py

# In the sidebar:
# 1) Set min/max installs
# 2) Tune query depth and run count
# 3) Toggle novelty sorting
# 4) Click "Найти"
# 5) Export CSV when results look good

Typical workflow:

Start with conservative filters (min_installs=10_000, max_installs=200_000).
Run several iterations to enrich the dedup database.
Enable novelty sort for “hidden gems” style prioritization.
Export CSV and feed it into your analysis pipeline.

Configuration

Project configuration lives in config.py.

Key knobs:

DEFAULT_MIN_INSTALLS, DEFAULT_MAX_INSTALLS
DEFAULT_RESULTS_LIMIT, DEFAULT_SEARCH_DEPTH, DEFAULT_QUERIES_PER_RUN
DEFAULT_WORKERS
SEARCH_LOCALES
GAME_KEYWORDS, NICHE_KEYWORDS, UTILITY_KEYWORDS, ALL_KEYWORDS
CACHE_TTL_SEARCH, CACHE_TTL_DETAILS
DB_PATH

No .env is required right now.

Note

If you need environment-driven config, the clean extension path is to introduce a small settings layer (e.g., pydantic-settings or os.getenv) and keep config.py as defaults.

License

This project is distributed under the GPL-3.0 license. See LICENSE for full legal terms.

Contacts

Maintainer and channels are listed below.

❤️ Support the Project

If you find this tool useful, consider leaving a ⭐ on GitHub or supporting the author directly:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

App Finder 100k

Table of Contents

Features

Tech Stack

Project Structure

Key Design Decisions

Getting Started

Prerequisites

Installation

Testing

Deployment

Minimal production approach

Docker-style deployment outline

Usage

Configuration

License

Contacts

❤️ Support the Project

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
trigger action		trigger action
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
config.py		config.py
db.py		db.py
requirements.txt		requirements.txt
scoring.py		scoring.py
scraper.py		scraper.py
ui.py		ui.py

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

App Finder 100k

Table of Contents

Features

Tech Stack

Project Structure

Key Design Decisions

Getting Started

Prerequisites

Installation

Testing

Deployment

Minimal production approach

Docker-style deployment outline

Usage

Configuration

License

Contacts

❤️ Support the Project

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages