emergingbytes.com Technology blog for data warehousing, business intelligence, iPhone Apps Development, Android Apps development and facebook applications

4Oct/171

Google Cloud BigQuery Selecting Random Sample

Following is the sudo code to select random rows in Google Cloud BigQuery:

SELECT *
FROM {TABLE}
WHERE RAND() < {number of random rows}/{total rows}

Filed under: BigQuery, Cloud 1 Comment
9Jun/170

Google BigQuery Group_Concat in Standard SQL

Google BigQuery provides a function (Group_Concat) to merge text in rows into one single row. It is an Aggregation Function and is supported in Legacy SQL.

It's equivalent in Standard SQL is String_Agg.

9Jun/170

Streaming Data into Google BigQuery

Following is a step by step guide to stream live data into Google BigQuery

  1. Create a dataset in BigQuery under your project (http://bigquery.cloud.google.com)
  2. Set default expiry of the dataset to 1 day (or more)
  3. Create an empty template table e.g livestream
  4. Change default expiration of template table to 1 year (or more). Use command line to run following command
    • bq update --expiration 31104000  dataset.livestream
  5. Setup a script to insert streaming data using Google SDK
  6. Recommended approach is to use templateSuffix while inserting data
    • It creates a new table every time you change suffix (for example, hour, user id etc)
    • No code required to create new table in your script
    • It simplifies management of querying large datasets
  7. Remember:
    • Data is not inserted in template table but in table_suffix (for example livestream_20170505)
    • Do not delete template table

To get help on setting up streaming data into Google BigQuery Click Here.

To get help on setting up Google App Engine to process real-time data stream into BigQuery Click Here.

To get help on generating real time insights and visuals Click Here.