Skip to content

sqlhabit/bqcsv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bqcsv

Upload a local CSV file to BigQuery using the bq CLI and your existing gcloud authentication.

Why a dedicated CLI tool?

Out of the box, Google's bq CLI cannot create a table with column names inferred from a CSV file.

bqcsv fixes that:

  • detects the schema from the CSV file
  • creates a table with proper column names and types
  • loads the CSV file using bq load

Authentication

No additional authentication is needed.

bqcsv uses your existing authentication via gcloud auth login.

Requirements

How to use bqcsv

Upload a CSV file to a table

To upload a CSV file, specify your project ID, dataset ID, and table name:

bqcsv data.csv --project my-gcp-project --dataset staging --table events_raw

The --table argument is optional. By default, bqcsv derives the table name from the CSV file:

bqcsv data.csv --project my-gcp-project --dataset staging

# is identical to

bqcsv data.csv --project my-gcp-project --dataset staging --table data

Saving your configuration

To avoid passing --project, --dataset, or --table on every run, save them to your local config:

bqcsv config set --project my-gcp-project --dataset analytics --table events
bqcsv config show

Defaults are stored in ~/.config/bqcsv/config.toml.

After you set your defaults, you can call bqcsv without arguments:

bqcsv data.csv

If you have not set a default --table value, the table name is derived from the CSV file.

Development

Install from your local repo

pip install -e .

Testing

To delete a test table, use bq:

bq rm -f -t  PROJECT_ID:DATASET_ID.TABLE_NAME

You can run the module directly when working on a new feature or fixing a bug:

python -m src.cli config set --project PROJECT_ID --dataset DATASET_ID --table TEST_TABLE_NAME

Releasing to PyPI

  1. Bump the version in both places (they must match):

    • pyproject.toml[project].version
    • src/__init__.py__version__
  2. Install build tools (one-time):

    pip install build twine
  3. Run tests and commit the version bump.

  4. Build the package:

    python -m build

    This creates dist/bqcsv-<version>.tar.gz and dist/bqcsv-<version>-py3-none-any.whl.

  5. Upload to PyPI:

    twine upload dist/*

    On first upload, create an account at pypi.org and use an API token as the password (__token__ as the username).

  6. Tag the release (optional but recommended):

    git tag v0.2.0
    git push origin v0.2.0

After publishing, users can install the new version with:

pip install --upgrade bqcsv

About

CLI for loading CSVs to BigQuery. Keeps your column names and types intact.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors