4Oct/171
Google Cloud BigQuery Selecting Random Sample
Following is the sudo code to select random rows in Google Cloud BigQuery:
SELECT *
FROM {TABLE}
WHERE RAND() < {number of random rows}/{total rows}
9Jun/170
Google BigQuery Group_Concat in Standard SQL
Google BigQuery provides a function (Group_Concat) to merge text in rows into one single row. It is an Aggregation Function and is supported in Legacy SQL.
It's equivalent in Standard SQL is String_Agg.
9Jun/170
Streaming Data into Google BigQuery
Following is a step by step guide to stream live data into Google BigQuery
- Create a dataset in BigQuery under your project (http://bigquery.cloud.google.com)
- Set default expiry of the dataset to 1 day (or more)
- Create an empty template table e.g livestream
- Change default expiration of template table to 1 year (or more). Use command line to run following command
- bq update --expiration 31104000 dataset.livestream
- Setup a script to insert streaming data using Google SDK
- Recommended approach is to use templateSuffix while inserting data
- It creates a new table every time you change suffix (for example, hour, user id etc)
- No code required to create new table in your script
- It simplifies management of querying large datasets
- Remember:
- Data is not inserted in template table but in table_suffix (for example livestream_20170505)
- Do not delete template table
To get help on setting up streaming data into Google BigQuery Click Here.
To get help on setting up Google App Engine to process real-time data stream into BigQuery Click Here.
To get help on generating real time insights and visuals Click Here.