bigquery implementation
DESCRIPTION
Google BigQuery technical presentation for starting use of BigQueryTRANSCRIPT
Google BigQuery - Big data with SQL like query feature, but fast...
Google BigQueryGoogle BigQuery
BigQuery Features
● TB level data analysis● Fast mining response● SQL like query language● Multi-dataset interactive
support● Cheap and pay by use● Offline job support
Getting Start
BigQuery Web UI
https://bigquery.cloud.google.com/
BigQuery structure● Project● Dataset● Table● Job
Handson - Import
The easily way - Import Wizard
Load Data to BigQuery in CMD
CSV / JSON Cloud Storage BigQuery
Load CSV to BigQuerygsutil cp [source] gs://[bucket-name]# gsutil cp ~/Desktop/log.csv gs://your-bucket/Copying file:///Users/simonsu/Desktop/log.csv [Content-Type=text/csv]...Uploading: 4.59 MB/36.76 MB
bq load [project]:[dataset].[table] gs://[bucket]/[csv path] [schema]# bq load project.dataset gs://your-bucket/log.csv IP:STRING,DNS:STRING,TS:STRING,URL:STRING
Waiting on bqjob_rf4f3f1d9e2366a6_00000142c1bdd36f_1 ... (24s) Current status: DONE
Load JSON to BigQuerybq load --source_format NEWLINE_DELIMITED_JSON \ [project]:[dataset].[table] [json file] [schema file]
# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest ./sample.json ./schema.json
Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE
# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest gs://your-bucket/sample.json ./schema.
json
Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE
Handson - Query
Web way - Query Console
Install google_cloud_sdk (https://developers.google.com/cloud/sdk/)
Shell way - bq commad
Shell way - bq commad
bq query <sql_query># bq query 'select charge_unit,charge_desc,one_charge from testbq.test'
BigQuery - Query Language
Query syntax● SELECT● WITHIN● FROM● FLATTEN● JOIN● WHERE● GROUP BY● HAVING● ORDER BY● LIMIT
Query supportSupported functions and operators
● Aggregate functions● Arithmetic operators● Bitwise operators● Casting functions● Comparison functions● Date and time functions● IP functions● JSON functions● Logical operators● Mathematical functions● Regular expression functions● String functions● Table wildcard functions● URL functions● Window functions● Other functions
select charge_unit,charge_desc,one_charge from testbq.test
Select
+-----------------+----------------+--------------------+| charge_unit | charge_desc | one_charge |+-----------------+----------------+--------------------+| M | 按月計費 |0 || D | 按日計費 |0 || HH | 小時計費 |0 || T | 分計費 |0 || SS | 按次計費 |1 | +-----------------+----------------+--------------------+
SELECT a.THEID, a.THENAME ,b.DESCRIPITON FROM user01.USER_MST a LEFT JOIN user01.USER_DETAIL_MST b on a.THEID = b.THEID limit 10'
Join
+-----------------+----------------+-----------------------------+| a_THEPID | a_THENAME | b_DESCRIPITON |+-----------------+----------------+-----------------------------+| 2 | 關於道具 |在道具編成道具。 | | 2 | 關於道具 |寶玉。 || 1 | 關於夥伴 |勇氣覺醒。 || 1 | 關於夥伴 |編輯進行任務的隊伍。 || 1 | 關於夥伴 |數個不同的類型 |+-----------------+----------------+-----------------------------+
SELECT
fullName,
age,
gender,
citiesLived.place
FROM (FLATTEN([dataset.tableId], children))
WHERE
(citiesLived.yearsLived > 1995) AND
(children.age > 3)
GROUP BY fullName, age, gender, citiesLived.place
Flatten
+------------+-----+--------+--------------------+
| fullName | age | gender | citiesLived_place |
+------------+-----+--------+--------------------+
| John Doe | 22 | Male | Stockholm |
| Mike Jones | 35 | Male | Los Angeles |
| Mike Jones | 35 | Male | Washington DC |
| Mike Jones | 35 | Male | Portland |
| Mike Jones | 35 | Male | Austin |
+------------+-----+--------+---------------------+
SELECT word, COUNT(word) AS countFROM publicdata:samples.shakespeareWHERE (REGEXP_MATCH(word,r'\w\w\'\w\w'))GROUP BY wordORDER BY count DESCLIMIT 3;
Regular Expression
+-----------------+----------------+| word | count |+-----------------+----------------+| ne'er | 42 || we'll | 35 || We'll | 33 |+-----------------+----------------+
SELECT TOP (FORMAT_UTC_USEC(timestamp * 1000000), 5) AS top_revision_time, COUNT (*) AS revision_countFROM [publicdata:samples.wikipedia];
+----------------------------+----------------+| top_revision_time | revision_count |+----------------------------+----------------+| 2002-02-25 15:51:15.000000 | 20971 || 2002-02-25 15:43:11.000000 | 15955 || 2010-01-14 15:52:34.000000 | 3 || 2009-12-31 19:29:19.000000 | 3 || 2009-12-28 18:55:12.000000 | 3 |+----------------------------+----------------+
Time Function
SELECT DOMAIN(repository_homepage) AS user_domain, COUNT(*) AS activity_countFROM [publicdata:samples.github_timeline]GROUP BY user_domainHAVING user_domain IS NOT NULL AND user_domain != ''ORDER BY activity_count DESCLIMIT 5;
IP Function
+-----------------+----------------+| user_domain | activity_count |+-----------------+----------------+| github.com | 281879 || google.com | 34769 || khanacademy.org | 17316 || sourceforge.net | 15103 || mozilla.org | 14091 |+-----------------+----------------+
Handson - Programming
● Prepare a Google Cloud Platform project● Create a Service Account● Generate key from Service Account p12 key
Prepare
Google Service Account
web server applictionservice account
v.s.
Prepare Authentications
p12 key → pem key轉換$ openssl pkcs12 -in privatekey.p12 -out privatekey.pem -nocerts $ openssl rsa -in privatekey.pem -out key.pem
Node.js - bigquery模組
var bq = require('bigquery') , prjId = 'your-bigquery-project-id';
bq.init({ client_secret: '/path-to-client_secret.json', privatekey_pem: '/path-to-privatekey.pem', key_pem: '/path-to-key.pem'});
bq.job.listds(prjId, function(e,r,d){ if(e) console.log(e); console.log(JSON.stringify(d));});
操作時,透過bq呼叫job之下的function做操作
bigquery模組可參考:https://github.com/peihsinsu/bigquery
/* Ref: https://developers.google.com/apps-script/advanced/bigquery */var request = { query: 'SELECT TOP(word, 30) AS word, COUNT(*) AS word_count ' + 'FROM publicdata:samples.shakespeare WHERE LENGTH(word) > 10;' };var queryResults = BigQuery.Jobs.query(request, projectId);var jobId = queryResults.jobReference.jobId;queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId);var rows = queryResults.rows;while (queryResults.pageToken) { queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId, { pageToken: queryResults.pageToken }); rows = rows.concat(queryResults.rows);}
Google Drive way - Apps Script
● Features: https://cloud.google.com/products/bigquery#features● Case Studies: https://cloud.google.com/products/bigquery#case-
studies● Pricing: https://cloud.google.com/products/bigquery#pricing● Documentation: https://cloud.google.
com/products/bigquery#documentation● Query Reference: https://developers.google.com/bigquery/query-
reference
References