Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning

Author: Valliappa Lakshmanan
4.0
This Month Hacker News 1

Comments

by funkjunky   2018-11-10
Former GCP support here. Bigtable and Cloud Datastore (or the newer, shinier Firestore) are very different, and meant for different purposes.

Bigtable is meant for wide-column data at high volume. If your data can be organized into simple rows and columns, and you plan on using massive amounts of it at high throughout (think IoT transactions, for example), then bigtable is the right choice.

Datastore, on the other hand, is for semi-structured data, with parent/child hierarchies, key value pairs, etc. It isn't run on a cluster of nodes like bigtable, but is managed behind the scenes as part of App Engine. It is slower than bigtable, but is more sophisticated, and offers client libraries for ORMs (ndb), SQL-like queries, and the like.

There's a brief comparison chart here: https://cloud.google.com/storage-options/

I also highly recommend the Google cloud data engineering course at Coursera: https://www.coursera.org/specializations/gcp-data-machine-le...

Or the instructor's book, " Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning"

https://www.amazon.com/dp/1491974567/ref=cm_sw_r_cp_apa_DILV...