Joel Carron, Data Scientist at Mode
February 8, 2017
NaN minute read
Now that the Mode SQL editor supports running multiple SQL statements, you can do so much more than pull data back from the database. One door this opens is the ability to create persistent derived tables, which are tables generated when the results of a SQL query are written back into your database. Once in your database, these tables can be queried just like any other table.
Persistent derived tables provide performance gains by enabling you to query against data that has already been summarized, rather than performing transformations at the time of each query execution.
Suppose you have a raw event table called
page_visits that contains a record of every page visit on your website. You'd like to report on the count of unique site visitors by day over the last year. Executing such a query against your raw event table directly could be very expensive, as you would have to aggregate across every row in
page_visits for the last year. Instead, you can pre-calculate this aggregation by creating a persistent derived table with only the data you need (in this case,
unique_visitor_count). Querying this table instead would be much more performant, as the expensive aggregation would have already taken place.
You'll only be able to create a persistent derived table using Mode if you're authorized to write to at least one schema in your database.
While it's not necessary, creating a dedicated schema can help you manage database permissions and keep persistent derived tables organized. Rather than granting write-access to your entire database, where people might inadvertently alter important tables or cause other issues, you can instead grant write-access to a single schema. The unique schema will also ensure your persistent derived tables can be easily discovered and are distinguishable from raw tables.
The first step is to write a query that returns the data you want to write back to your database as a table.
In this example, information about orders is aggregated by month and sales rep.
SELECT DATE_TRUNC('month',o.occurred_at) AS month, sr.name AS sales_rep, SUM(o.total) AS units_sold, SUM(o.total_amt_usd) AS sales_usd FROM demo.orders o JOIN demo.accounts a ON o.account_id = a.id JOIN demo.sales_reps sr ON sr.id = a.sales_rep_id GROUP BY 1,2
Creating a table from your results will typically consist of three statements— beginning a transaction, creating the table, and committing the transaction. A database transaction is a block of code used to generate some change in a database. Each component of a transaction must execute successfully in order for the operation to complete. If a transaction fails midway through, all the changes made to that point get rolled back, ensuring your data is consistent.
The specific syntax you'll use to carry out each statement will depend on the database you're using. The table below has information on the syntax for these statements, with links to the database documentation.
*BigQuery does not require beginning and committing a transaction. Simply use a Create Table statement.
The example below (written for Postgres) creates a persistent derived table from the results of the previous query:
BEGIN; CREATE TABLE monthly_sales AS ( SELECT DATE_TRUNC('month',o.occurred_at) AS month, sr.name AS sales_rep, SUM(o.total) AS units_sold, SUM(o.total_amt_usd) AS sales_usd FROM demo.orders o JOIN demo.accounts a ON o.account_id = a.id JOIN demo.sales_reps sr ON sr.id = a.sales_rep_id GROUP BY 1,2 ); COMMIT;
To update the persistent derived table, you’ll first need to drop it, as most databases won’t allow you to create a table that already exists. You can accomplish this by adding a
DROP TABLE IF EXISTS transaction to the beginning of your query.
BEGIN; DROP TABLE IF EXISTS demo.monthly_sales; COMMIT; BEGIN; CREATE TABLE demo.monthly_sales AS ( SELECT DATE_TRUNC('month',o.occurred_at) AS month, sr.name AS sales_rep, SUM(o.total) AS units_sold, SUM(o.total_amt_usd) AS sales_usd FROM demo.orders o JOIN demo.accounts a ON o.account_id = a.id JOIN demo.sales_reps sr ON sr.id = a.sales_rep_id GROUP BY 1,2 ); COMMIT;
Now that you've created a persistent derived table, schedule it to update on a regular cadence using Mode's report schedule feature.
Keep in mind that the query used to generate your persistent derived table will still be hitting your database whenever the schedule runs, so it's a good idea not to run it more frequently than necessary. Scheduling persistent derived tables to update at off-peak times is a good way to manage your database's resources. To find low-impact times to schedule these, Mode users on the Plus Plan can view a visualization of their query runs by hour of week on the Stats page.
Mode organizations on the Plus Plan have some options for managing permissions for persistent derived tables.
You can limit who is able to create persistent derived tables on a schema by limiting access to the database connection that has write access. This will also prevent people who do not have access from editing the queries.
If you don't want to limit write-access to your database but you do want to restrict edit-access to the reports you're using to generate persistent derived tables, you can do so by moving those reports into a limited Space.
Here at Mode, we set up an additional schema specifically for scratch-work, which includes persistent derived tables that are not intended for wide use across our organization. Because all of these tables exist in one schema that's designated for scratch-work, we can easily purge it in case people forget to drop stuff. It also enables us to grant everyone the ability to create this sort of work, without being overly restrictive.