Big Query (bq cli)
This guide shows how to create a layer of control on top of the bq command line
Requirements
Big Query Setup
- Setup the Bigquery following the topics bellow
- Create a service account with the following permissions:
- BigQuery User
- BigQuery Data Viewer
- Create a JSON type service account key and store in a safe place
For the sake of this guide, we're following the gcp guide to interact with bigquery.
In case you already have a testing dataset, you can skip this part
Reference: https://cloud.google.com/iam/docs/creating-managing-service-accounts
Hoop Setup
In your local machine, start hoop
hoop start
Interacting with Datasets
Create the bq query
query Connection
Add the JSON service account into the KEY_FILE
input
This will allow interacting with the bq command line
- Try to execute a query
hoop exec -v bqquery -i 'SELECT * FROM babynames.names_2014 LIMIT 1'
Interacting with Jobs
Create a bq head
connection
- Try to fetch it with
bq head -j <session-id>
The previous command was adding the session id as the job id of executed queries on bigquery, thus we could use the previous session id to fetch it.
hoop exec bqhead -- -j <PREVIOUS_SESSION_ID>
More Patterns
The bq command line has other actions that could be used as patterns with Hoop.
Allow jobs to be async
$ bq query --use_legacy_sql=false --nosynchronous_mode
This allow the pattern of consulting the previous job with bq head --job_id ...
. It's useful for queries that take too long to execute.
Dry-run commands
Dry run allows showing how much of bytes will be consumed
$ bq query --use_legacy_sql=false 'SELECT * FROM babynames.names_2014 limit 5' --dry_run
Query successfully validated. Assuming (...), running this query will process 654791 bytes of data.
A plugin could allow queries that has passed by the --dry-run
mode.
Command Line Quota
bq allows specifying the maximum bytes billed, this option could be used as a retriction mechanism
$ bq query \
--use_legacy_sql=false \
--maximum_bytes_billed 1 \
'SELECT * FROM babynames.names_2014 limit 100;'
BigQuery error in query operation: Error processing job
'myproject:bqjob_r107f63c4f31d3d37_000001860d50d5f8_1': Query exceeded
limit for bytes billed: 1. 10485760 or higher required.