I have a table in BigQuery that looks something like this:
schema = [
bigquery.SchemaField('timestamp', 'TIMESTAMP', mode='REQUIRED', description='Data point timestamp'),
bigquery.SchemaField('event_id', 'STRING', description='EventID'),
[...]
]
The table has a fairly large dataset, and I’m trying to find write an efficient query that returns the number of events that happened in the last 24 hours but also within the last N days. That is, two different records with different conditions but the same event_id
. I don’t care so much about the actual event_id
, but rather the distribution.
Ideally, the query would return something like this:
7_days: 20
30_days: 15
60_days: 7
If it’s impossible to do this in pure SQL, I also have Pandas available at my disposal.
Go to Source
Author: vpetersson