When BigQuery Export is enabled in Google Workspace, logs are sent to BigQuery in two main big tables, named "usage" and "activity" there.
Google Workspace logs contain a lot of sensitive data. And with that condition, the data masking feature in the BigQuery provides many benefits to ease masking out those sensitive data for reporting purposes.
Unfortunately, inside the "usage" and "activity" tables, they have so many nested columns.
Example on the "activity" table, if we'd like to mask out all columns that contain email sensitivity, we need to identify all email columns (e.g. search with keyword "user_email"), and then apply mask function/policy respectively there. Those columns are mostly inside their parent field with column type RECORD (so called "nested"):
The policy tags and data policies ( https://cloud.google.com/bigquery/docs/column-data-masking ) cannot be applied to the nested columns there. As a result, we need to create custom masking routines ( https://cloud.google.com/bigquery/docs/user-defined-functions#custom-mask ), so called "UDF" (User-Defined Function) with some complex SQL syntaxes there.
CREATE OR REPLACE FUNCTION `projectid-abcde.dataset-xyz.my_custom_email_mask`(email STRING) RETURNS STRING AS ( CASE WHEN email IS NULL THEN NULL ELSE LEFT(TO_HEX(SHA1(CAST(email AS BYTES))), 7) || '-' || RIGHT(TO_HEX(SHA256(CAST(email AS BYTES))), 7) || '@' || SUBSTR(email, INSTR(email, '@') + 1) END );
SELECT (dataset-xyz.my_custom_email_mask(email)) as email, t.* EXCEPT (email) FROM `projectid-abcde.dataset-xyz.activity` t WHERE email IS NOT NULL;
Problem comes when the column is not STRING but inside nested RECORD.
SELECT (dataset-xyz.my_custom_email_mask(email)) as email, (dataset-xyz.my_custom_email_mask(mobile.user_email)) as mobile.user_email, (dataset-xyz.my_custom_email_mask(meet.identifier)) as meet.identifier, (dataset-xyz.my_custom_email_mask(meet.organizer_email)) as meet.organizer_email, t.* EXCEPT (email, mobile.user_email, meet.identifier, meet.organizer_email) FROM `projectid-abcde.dataset-xyz.activity` t WHERE mobile is not NULL or meet is not NULL;
SELECT (dataset-xyz.my_custom_email_mask(email)) as email, t.* EXCEPT (email) REPLACE( (SELECT AS STRUCT (dataset-xyz.my_custom_email_mask(mobile.user_email)) as user_email, mobile.* EXCEPT (user_email)) AS mobile, (SELECT AS STRUCT (dataset-xyz.my_custom_email_mask(meet.identifier)) as identifier, (dataset-xyz.my_custom_email_mask(meet.organizer_email)) as organizer_email, meet.* EXCEPT (identifier, organizer_email)) AS meet ) FROM `projectid-abcde.dataset-xyz.activity` t WHERE mobile is not NULL or meet is not NULL;
CREATE OR REPLACE VIEW `projectid-abcde.dataset-xyz.activity_with_masked_email_view` AS SELECT (dataset-xyz.my_custom_email_mask(email)) as email, t.* EXCEPT (email) REPLACE( (SELECT AS STRUCT (dataset-xyz.my_custom_email_mask(mobile.user_email)) as user_email, mobile.* EXCEPT (user_email)) AS mobile, (SELECT AS STRUCT (dataset-xyz.my_custom_email_mask(meet.identifier)) as identifier, (dataset-xyz.my_custom_email_mask(meet.organizer_email)) as organizer_email, meet.* EXCEPT (identifier, organizer_email)) AS meet ) FROM `projectid-abcde.dataset-xyz.activity` t WHERE mobile is not NULL or meet is not NULL;