How I became a Google Cloud Professional Data Engineer in two weeks
Disclaimer: the title is clickbait, it actually took me more than two weeks of work to get there, but the last two weeks of preparation have been decisive in me getting my certification and now that you are here, you might as well read until the end ;)
Around one year ago, I joined the company SFEIR as a ‘agile cloud data engineer’. Basically a data engineer leveraging cloud tools to do his job. As googles’ partners and strong users of their cloud technologies (Google Cloud Platform aka GCP) we need to get certified, both to become proficient in our tasks but also for credibility purposes to our customers and google.
Therefore, getting certified and getting the Google Cloud Professional Data Engineer was part of my goals of this year, which I achieved by the end of november.
Background
Given those conditions, a lot of my colleagues are already certified and quite proficient with GCP. Moreover, my company also provides customer trainings with an offer called SFEIR INSTITUTES.
As a beginner with GCP, I was first on-boarded on GCP with this one week training.
However this training is just targeted to get you acquainted with GCP and is nowhere near enough to be a cloud expert or certification ready. It also gave me an access to some Qwiklabs that provided both a sandbox and further content about GCP.
For preparing only the certification, I recommend using this one.
Next Steps
Once this training under my belt, I was able to at least understand my customers requirements and navigate GCP and fulfill my missions. This gave me more experience with GCP and progressively over-loaded my knowledge of the tools. I also had my own side projects with it, allowing me to try my hand at things that are in the certification that I was not able to use in a professional situation (stay tuned to hear more about it, yes this is shameless self promotion).
Finally, some of my colleagues, more experienced in GCP decided to organize some cohorts to train people and prepare them for the certification. This was a good initiative as it clarified a lot of things for me, especially about big table keys and rows system.
But once again, if it is good to clarify things and give you strong foundations it is still not enough. At some point, you need specific and targeted preparation toward the certification.
Last two weeks
Then, I booked my exam. I had two weeks left to prepare it and make sure to get it. That’s when I entered ‘monk mode’.
If I only had an hour to chop down a tree, I would spend the first 45 minutes sharpening my axe.” — Abraham Lincoln.
Basically, I leveraged this video which is quite complete and where the speaker provides her feedback on the certification along with what are the main points to look for and prepare for:
But, watching videos is something quite passive and the only real way to remember things and learn things is actually through reading and repeating.
So, for each chapter in this video, here are the links in the google documentation I used to prepare for the exam (and actually, I probably would not have made it without those links).
Database:
https://cloud.google.com/sql/docs/mysql/introduction
https://cloud.google.com/sql/docs/mysql/key-terms
https://cloud.google.com/sql/docs/mysql/instance-access-control
https://cloud.google.com/sql/docs/mysql/data-residency-overview
https://cloud.google.com/sql/docs/mysql/iam-overview
https://cloud.google.com/sql/docs/mysql/connect-overview
https://cloud.google.com/sql/docs/mysql/replication
https://cloud.google.com/sql/docs/mysql/high-availability
https://cloud.google.com/sql/docs/mysql/best-practices
Cloud Spanner
https://cloud.google.com/spanner/docs/migrating-mysql-to-spanner
https://cloud.google.com/spanner/docs/schema-and-data-model
https://cloud.google.com/spanner/docs/schema-design
https://cloud.google.com/spanner/docs/secondary-indexes
https://cloud.google.com/spanner/docs/foreign-keys/how-to
Cloud Bigtable
https://cloud.google.com/bigtable/docs/overview
https://cloud.google.com/bigtable/docs/instances-clusters-nodes
https://cloud.google.com/bigtable/docs/choosing-ssd-hdd
https://cloud.google.com/bigtable/docs/replication-overview
https://cloud.google.com/bigtable/docs/schema-design-steps
https://cloud.google.com/bigtable/docs/schema-design
https://cloud.google.com/bigtable/docs/schema-design-time-series
https://cloud.google.com/bigtable/docs/performance
https://cloud.google.com/bigtable/docs/keyvis-overview
https://cloud.google.com/bigtable/docs/keyvis-getting-started
https://cloud.google.com/bigtable/docs/keyvis-exploring-heatmaps
https://cloud.google.com/bigtable/docs/keyvis-patterns
Big Query
https://cloud.google.com/bigquery/docs/introduction
https://cloud.google.com/bigquery-ml/docs/introduction
https://cloud.google.com/bigquery/docs/tables-intro
https://cloud.google.com/bigquery/docs/datasets-intro
https://cloud.google.com/bigquery/docs/locations
https://cloud.google.com/bigquery/docs/tables-intro
https://cloud.google.com/bigquery/docs/views-intro
https://cloud.google.com/bigquery/docs/materialized-views-intro
https://cloud.google.com/bigquery/docs/nested-repeated
https://cloud.google.com/bigquery/docs/clustered-tables
https://cloud.google.com/bigquery/docs/data-governance
https://cloud.google.com/bigquery/docs/access-control
https://cloud.google.com/bigquery/docs/authorized-views
https://cloud.google.com/bigquery/docs/external-data-sources
https://cloud.google.com/bigquery/docs/dts-introduction
https://cloud.google.com/bigquery/docs/monitoring
https://cloud.google.com/bigquery/docs/best-practices-costs
https://cloud.google.com/bigquery/docs/best-practices-performance-patterns
https://cloud.google.com/bigquery/docs/best-practices-storage
https://cloud.google.com/bigquery/docs/querying-wildcard-tables
Datafflow
https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline
https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
https://cloud.google.com/dataflow/docs/concepts/access-control
https://cloud.google.com/dataflow/docs/guides/common-errors
https://www.youtube.com/watch?v=65lmwL7rSy4&ab_channel=GoogleCloudTech
https://www.youtube.com/watch?v=oJ-LueBvOcM&ab_channel=GoogleCloudTech
https://www.youtube.com/watch?v=MuFA6CSti6M&ab_channel=GoogleCloudTech
Dataproc
https://cloud.google.com/dataproc/docs/concepts/accessing/cluster-web-interfaces
https://cloud.google.com/dataproc/docs/concepts/accessing/dataproc-gateway
https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/network/
https://cloud.google.com/dataproc/docs/concepts/iam/iam
PubSub
https://cloud.google.com/pubsub/docs/encryption
https://cloud.google.com/pubsub/docs/access-control
https://cloud.google.com/pubsub/docs/replay-message
https://www.youtube.com/watch?v=VyLmmamuOVo&t=1s&ab_channel=GoogleCloudTech
Machine learning
https://cloud.google.com/architecture/data-preprocessing-for-ml-with-tf-transform-pt1
https://cloud.google.com/ai-platform/training/docs/using-gpus
https://cloud.google.com/ai-platform/training/docs/using-tpus
Network
https://cloud.google.com/vpc/docs/overview
https://cloud.google.com/vpc/docs/vpc
https://cloud.google.com/vpc/docs/shared-vpc
https://cloud.google.com/vpc/docs/vpc-peering
Encryption
https://cloud.google.com/sql/docs/mysql/cmek
https://cloud.google.com/sql/docs/mysql/client-side-encryption
IAM and Security
https://cloud.google.com/kms/docs/resource-hierarchy
https://cloud.google.com/docs/security/key-management-deep-dive
Last but not least
This link also provides a complete use case using all the relevant data engineering and ingestion technologies. I strongly advise reading it.
I also recommend doing the following:
- Try this exam samples before and after preparation
- I also suggest using this book at least for the questions samples that are provided in the book and the explanation on how and why to pick each answer.
I hope this will be useful and will help you prepare for your exam. Good luck