Daniel Wakefield Zola 2024-10-05T00:00:00+00:00 https://danw.xyz/atom.xml Metabase tips and tricks 2024-10-05T00:00:00+00:00 2024-10-05T00:00:00+00:00 Unknown https://danw.xyz/posts/metabase-tricks/ <h2 id="flexible-date-bin-ed-dashboards">flexible date bin'ed dashboards</h2> <p>You want to avoid duplicating questions when your data consumers want different timeframe views over the data.</p> <p>E.g Total Payments by Day, Total Payments by Month, etc</p> <p>We handle this with a few tricks that we can apply easily that the data consumers understand without issue.</p> <h3 id="helper-model">Helper Model</h3> <p>We defined a <a href="https://www.metabase.com/docs/latest/data-modeling/models">Helper Model</a> with the time periods we offer for analytics. We're lucky that we don't need super granular stats and can just offer down to <code>hour</code> but you could use <code>date_bin</code> to support things like <code>15 minutes</code> if needed.</p> <pre data-lang="sql" style="background-color:#2b303b;color:#c0c5ce;" class="language-sql "><code class="language-sql" data-lang="sql"><span> </span><span style="color:#b48ead;">SELECT </span><span> time_period </span><span> </span><span style="color:#b48ead;">FROM</span><span> ( </span><span> </span><span style="color:#b48ead;">VALUES </span><span> (&#39;</span><span style="color:#a3be8c;">hour</span><span>&#39;), </span><span> (&#39;</span><span style="color:#a3be8c;">day</span><span>&#39;), </span><span> (&#39;</span><span style="color:#a3be8c;">week</span><span>&#39;), </span><span> (&#39;</span><span style="color:#a3be8c;">month</span><span>&#39;), </span><span> (&#39;</span><span style="color:#a3be8c;">quarter</span><span>&#39;), </span><span> (&#39;</span><span style="color:#a3be8c;">year</span><span>&#39;), </span><span> ) t(time_period) </span></code></pre> <h3 id="filtering-the-result-set">Filtering the result set</h3> <p>In any query that requires the ability to date bin we use one of the following as part of the query definition</p> <h4 id="use-alongside-date-field-filter">Use alongside date field filter</h4> <p>This works great for simple queries where you are filtering on a single date field as the date field filter has a bunch of useful settings like inclusive/exclusive end, ability to exclude days of the week etc that are often requested by our consumers</p> <pre data-lang="sql" style="background-color:#2b303b;color:#c0c5ce;" class="language-sql "><code class="language-sql" data-lang="sql"><span style="color:#b48ead;">SELECT </span><span> date_trunc({{time_period}}, created_at)::</span><span style="color:#b48ead;">date </span><span>AS time_period, </span><span> </span><span style="color:#96b5b4;">SUM</span><span>(amount) </span><span style="color:#b48ead;">FROM</span><span> payments </span><span style="color:#b48ead;">WHERE </span><span> </span><span style="color:#d08770;">1</span><span>=</span><span style="color:#d08770;">1 </span><span> [[AND {{payment_created_at_field_filter}}]] </span><span style="color:#b48ead;">GROUP BY</span><span> time_period </span></code></pre> <h4 id="when-you-can-t-use-the-field-filter">When you can't use the field filter</h4> <p>Metabase doesnt support date field filters on columns inside CTE's, and there are times where you want to filter multiple tables to the same time range.</p> <p>In those cases we add a <code>number of periods</code> filter that defines the how far to look back over instead</p> <pre data-lang="sql" style="background-color:#2b303b;color:#c0c5ce;" class="language-sql "><code class="language-sql" data-lang="sql"><span style="color:#b48ead;">SELECT </span><span> date_trunc({{time_period}}, created_at)::</span><span style="color:#b48ead;">date </span><span>as time_period, </span><span> </span><span style="color:#96b5b4;">sum</span><span>(amount) </span><span style="color:#b48ead;">FROM</span><span> payments </span><span style="color:#b48ead;">JOIN</span><span> users on </span><span style="color:#d08770;">payments</span><span>.</span><span style="color:#d08770;">user_id </span><span>= </span><span style="color:#d08770;">users</span><span>.</span><span style="color:#d08770;">id </span><span style="color:#b48ead;">WHERE </span><span> </span><span style="color:#d08770;">payments</span><span>.</span><span style="color:#d08770;">created_at </span><span>&gt;= date_trunc({{time_period}}, NOW() - ({{num_periods}} || &#39; &#39; || {{time_period}})::interval) </span><span> AND </span><span> </span><span style="color:#d08770;">users</span><span>.</span><span style="color:#d08770;">last_sign_in_at </span><span>&gt;= date_trunc({{time_period}}, NOW() - ({{num_periods}} || &#39; &#39; || {{time_period}})::interval) </span><span style="color:#b48ead;">GROUP BY</span><span> time_period </span></code></pre> <h5 id="caveat-1-quarter">Caveat 1: Quarter</h5> <p><code>date_trunc</code> works with <code>quarter</code> but <code>interval</code> doesnt. It would be a lovely addition<sup class="footnote-reference"><a href="#quarter">1</a></sup> but for now we add a case statement that we're happy with if the question actually requires this size of window. This is how you would handle using <code>date_bin('15 minutes', ...)</code> as well, you'd have to do the math in order to move back over the right number of the periods.</p> <pre data-lang="sql" style="background-color:#2b303b;color:#c0c5ce;" class="language-sql "><code class="language-sql" data-lang="sql"><span style="color:#d08770;">payments</span><span>.</span><span style="color:#d08770;">created_at </span><span>&gt;= date_trunc( </span><span> {{time_period}}, </span><span> </span><span style="color:#b48ead;">CASE</span><span> {{time_period}} </span><span> </span><span style="color:#b48ead;">WHEN </span><span>&#39;</span><span style="color:#a3be8c;">quarter</span><span>&#39; </span><span style="color:#b48ead;">THEN </span><span> date_trunc(&#39;</span><span style="color:#a3be8c;">quarter</span><span>&#39;, NOW()) - ((({{num_periods}})::</span><span style="color:#b48ead;">integer </span><span style="color:#bf616a;">* </span><span style="color:#d08770;">3</span><span>) || &#39;</span><span style="color:#a3be8c;"> month</span><span>&#39;)::interval </span><span> </span><span style="color:#b48ead;">ELSE </span><span> NOW() - ({{num_periods}} || &#39; &#39; || {{time_period}})::interval </span><span> </span><span style="color:#b48ead;">END </span><span>) </span></code></pre> <p><img src="https://danw.xyz/posts/metabase-tricks/mbquarter.png" alt="Image" /></p> <h5 id="caveat-2-incomplete-periods">Caveat 2: Incomplete periods</h5> <p>We include data from the incomplete current period but we can change this by adding a less than version to complement each filter</p> <pre data-lang="sql" style="background-color:#2b303b;color:#c0c5ce;" class="language-sql "><code class="language-sql" data-lang="sql"><span style="color:#b48ead;">SELECT </span><span> date_trunc({{time_period}}, created_at)::</span><span style="color:#b48ead;">date </span><span>AS time_period, </span><span> </span><span style="color:#96b5b4;">SUM</span><span>(amount) </span><span style="color:#b48ead;">FROM</span><span> payments </span><span> </span><span style="color:#d08770;">payments</span><span>.</span><span style="color:#d08770;">created_at </span><span>&gt;= date_trunc({{time_period}}, NOW() - ({{num_periods}} || &#39; &#39; || {{time_period}})::interval) </span><span> AND </span><span style="color:#d08770;">payments</span><span>.</span><span style="color:#d08770;">created_at </span><span>&lt; date_trunc({{time_period}}, NOW()) </span></code></pre> <p>The majority of our data analysis is done on historic data, but of course you can flip the signs, or add both complements so the query both before and after <code>NOW()</code> if your data has a more leading than lagging profile.</p> <h3 id="optional-targets">Optional targets</h3> <p>You'll probably have a tracking board, daily or monthly that the leadership team love to watch. They will also want to compare past periods the current one without going to a new dashboard. Combining <code>COALESCE</code> with optional filters and you can inject SQL defaults.</p> <p>This gets around the fact that raw date filters can't be set to <code>today</code>.</p> <p>Something like this works</p> <pre data-lang="sql" style="background-color:#2b303b;color:#c0c5ce;" class="language-sql "><code class="language-sql" data-lang="sql"><span style="color:#b48ead;">with</span><span> filtered_payments as (</span><span style="color:#65737e;">/* some query */</span><span>) </span><span> </span><span style="color:#b48ead;">SELECT </span><span> </span><span style="color:#96b5b4;">COUNT</span><span>(</span><span style="color:#bf616a;">*</span><span>) </span><span style="color:#b48ead;">FROM</span><span> filtered_payments </span><span style="color:#b48ead;">WHERE </span><span> date_trunc(&#39;</span><span style="color:#a3be8c;">day</span><span>&#39;, </span><span style="color:#d08770;">filtered_payments</span><span>.</span><span style="color:#d08770;">created_at</span><span>) = date_trunc(&#39;</span><span style="color:#a3be8c;">day</span><span>&#39;, COALESCE( [[{{date_to_look_at}} ,]] NOW() )) </span></code></pre> <h3 id="percent-function">Percent function</h3> <p>To remove some error prone casting and coalesce'ing related to calculating percentages, especially over counts, we added a function to make it more readable</p> <pre data-lang="sql" style="background-color:#2b303b;color:#c0c5ce;" class="language-sql "><code class="language-sql" data-lang="sql"><span style="color:#b48ead;">CREATE OR REPLACE FUNCTION </span><span>public.</span><span style="color:#8fa1b3;">percent</span><span>(portion </span><span style="color:#b48ead;">numeric</span><span>, total </span><span style="color:#b48ead;">numeric</span><span>, round_level </span><span style="color:#b48ead;">integer DEFAULT </span><span style="color:#d08770;">1</span><span>) </span><span> RETURNS </span><span style="color:#b48ead;">numeric </span><span> LANGUAGE sql </span><span> IMMUTABLE PARALLEL SAFE STRICT </span><span>AS $function$ </span><span> </span><span style="color:#b48ead;">select </span><span> </span><span style="color:#b48ead;">CASE </span><span> </span><span style="color:#b48ead;">WHEN</span><span> total = </span><span style="color:#d08770;">0 </span><span style="color:#b48ead;">THEN </span><span style="color:#d08770;">0 </span><span> </span><span style="color:#b48ead;">ELSE</span><span> ROUND(</span><span style="color:#d08770;">100 </span><span style="color:#bf616a;">*</span><span> (portion / total), round_level) </span><span> </span><span style="color:#b48ead;">END </span><span>$function$ </span></code></pre> <p>It replaces the following, while also handling counts that return <code>0</code></p> <pre style="background-color:#2b303b;color:#c0c5ce;"><code><span>SELECT </span><span> ROUND(100 * cast(count(*) FILTER (where amount &gt; 100) as float)/cast(count(*) as float)) AS old_big_payment_rate, </span><span> percent( </span><span> COUNT(*) FILTER (where amount &gt; 100), </span><span> COUNT(*) </span><span> ) AS new_big_payment_rate </span><span>FROM payments </span></code></pre> <h3 id="commonly-accessed-questions">Commonly accessed questions</h3> <p>We do periodic cleaning of the available dashboards; we'll archive things that havent been accessed in the last 6-12 months to keep the number of questions and dashboards down. The below will give you the commonly accessed queries, flipping or changing the order to last_execution will show you the ones that aren't often acessed. Use an Anti-Join to find the reports that haven't been run at all in the period you're checking.</p> <pre style="background-color:#2b303b;color:#c0c5ce;"><code><span>with card_data as ( </span><span> select </span><span> card_id, </span><span> COUNT(*) as execution_count, </span><span> MAX(started_at) as last_execution </span><span> from query_execution </span><span> where </span><span> started_at &gt; NOW() - &#39;12 months&#39;::interval </span><span> GROUP BY 1 </span><span>) </span><span> </span><span>select </span><span> report_card.id, </span><span> report_card.name, </span><span> CONCAT(&#39;https://data.example.com/questions/&#39;, report_card.id), </span><span> card_data.execution_count, </span><span> card_data.last_execution </span><span>from report_card </span><span>join card_data on card_data.card_id = report_card.id </span><span>where </span><span> report_card.archived = false </span><span>ORDER BY card_data.execution_count DESC </span></code></pre> <div class="footnote-definition" id="quarter"><sup class="footnote-definition-label">1</sup> <p><a href="https://www.postgresql.org/message-id/6b2a4293-6f88-8114-aa0f-d7becdaafbdf%40cam.ac.uk">Wish: support "quarter" in Interval</a></p> </div> Using certificate based ssh authentication 2014-09-20T00:00:00+00:00 2014-09-20T00:00:00+00:00 Unknown https://danw.xyz/posts/cert-authentication/ <h2 id="what-is-a-certificate">What is a Certificate?</h2> <p>A certificate is a form of <a href="http://en.wikipedia.org/wiki/Public-key_cryptography">Public Key Cryptography</a> that allows you to trust someone.</p> <p>The certificate authority's (CA) private key is used to sign the public key of anyone or anything that is trusted by the CA.</p> <p>Anybody can then decide to trust the same people as the CA by adding the CA's public key to their keyring.</p> <h2 id="why-should-i-use-one">Why should I use one?</h2> <p>One of the main reasons is that CA signing works in both directions.</p> <p>A user can check that a machine is trusted instead of relying on adding individual hosts to <code>~/.ssh/known_hosts</code>.</p> <p>This is useful if you are using the same IP address but you are recreating a machine for testing purposes and you don't want to be bugged by SSH key checking messages.</p> <p>A machine can check that a user is trusted instead of relying on adding user's public key's to <code>~/.ssh/authorized_keys</code> for each user that should be allowed to login.</p> <p>Signing keys also allows very fine grained control over many aspects that key based authentication doesn't.</p> <p>The reason I have switched from key based to cert based authentication is that it is much easier to handle access controls access your personal machines when you are regularly creating new VM's and VPS's.</p> <p>You only need to add the CA's public key to a server rather than all of your user's public keys to the correct <code>~/.ssh/authorized_key</code>s file.</p> <h2 id="setting-it-up">Setting it up.</h2> <p>For this you will need:</p> <ul> <li>2+ machines running your preferred form of Linux (I used Debian 7)</li> <li>Openssh 5.4+</li> <li>Basic knowledge of the command line</li> </ul> <p>In this example I will be using the machine I am writing this on as both the CA and the user I want to trust. This isn't totally secure but will work perfectly fine with a small number of users.</p> <p>First we will create a user CA which will allow the users to sign into servers without a prompt.</p> <p>The first step is to create your CA's keypair.</p> <pre data-lang="bash" style="background-color:#2b303b;color:#c0c5ce;" class="language-bash "><code class="language-bash" data-lang="bash"><span style="color:#bf616a;">$</span><span> mkdir</span><span style="color:#bf616a;"> -p ~</span><span>/.ssh/ca </span><span style="color:#bf616a;">$</span><span> cd </span><span style="color:#bf616a;">~</span><span>/.ssh/ca </span><span style="color:#bf616a;">$</span><span> ssh-keygen</span><span style="color:#bf616a;"> -f</span><span> ca</span><span style="color:#bf616a;"> -C </span><span>&quot;</span><span style="color:#a3be8c;">Comment on the keys</span><span>&quot; </span><span> </span><span style="color:#bf616a;">Generating</span><span> public/private rsa key pair. </span><span> </span><span style="color:#bf616a;">Enter</span><span> passphrase (empty for no passphrase)</span><span style="color:#96b5b4;">: </span><span> </span><span style="color:#bf616a;">Enter</span><span> same passphrase again: </span><span> </span><span style="color:#bf616a;">Your</span><span> identification has been saved in ca. </span><span> </span><span style="color:#bf616a;">Your</span><span> public key has been saved in ca.pub. </span><span style="color:#bf616a;">$</span><span> ls </span><span> </span><span style="color:#bf616a;">ca</span><span> ca.pub </span></code></pre> <p>You should provide a strong passphrase when asked as this will mean that if attacker gains access to your keys your security is not compromised.</p> <p>The next step is to place the public key on the machine(s) that should trust the users.</p> <pre data-lang="bash" style="background-color:#2b303b;color:#c0c5ce;" class="language-bash "><code class="language-bash" data-lang="bash"><span style="color:#bf616a;">$</span><span> scp ca.pub [email protected]:/home/user </span><span style="color:#bf616a;">$</span><span> ssh [email protected] &quot;</span><span style="color:#a3be8c;">sudo cp ~/ca.pub /etc/ssh</span><span>&quot; </span><span style="color:#bf616a;">$</span><span> ssh [email protected] &quot;</span><span style="color:#a3be8c;">sudo sed -i &#39;/TrustedUserCAKeys/d&#39; /etc/ssh/sshd_config</span><span>&quot; </span><span style="color:#bf616a;">$</span><span> ssh [email protected] &quot;</span><span style="color:#a3be8c;">echo &#39;TrustedUserCAKeys /etc/ssh/ca.pub&#39; | sudo tee -a /etc/ssh/sshd_config</span><span>&quot; </span><span style="color:#bf616a;">$</span><span> ssh [email protected] &quot;</span><span style="color:#a3be8c;">sudo /etc/init.d/ssh restart</span><span>&quot; </span><span> </span><span style="color:#65737e;"># The above will only work on a machine set up with passwordless sudo </span><span> </span><span style="color:#65737e;"># Otherwise ssh to the machine and run the commands manually </span></code></pre> <p>This sets up the server to trust signed keys from the CA, so the next step is to sign our users public key.</p> <p>If you already have a set of user keys you should skip the first step.</p> <pre data-lang="bash" style="background-color:#2b303b;color:#c0c5ce;" class="language-bash "><code class="language-bash" data-lang="bash"><span style="color:#bf616a;">$</span><span> ssh-keygen </span><span> </span><span style="color:#bf616a;">Generating</span><span> public/private rsa key pair. </span><span> </span><span style="color:#bf616a;">Enter</span><span> passphrase (empty for no passphrase)</span><span style="color:#96b5b4;">: </span><span> </span><span style="color:#bf616a;">Enter</span><span> same passphrase again: </span><span> </span><span style="color:#bf616a;">Your</span><span> identification has been saved in </span><span style="color:#bf616a;">~</span><span>/.ssh/id_rsa. </span><span> </span><span style="color:#bf616a;">Your</span><span> public key has been saved in </span><span style="color:#bf616a;">~</span><span>/.ssh/id_rsa.pub </span><span> </span><span style="color:#65737e;"># This generates a keypair with default settings </span><span> </span><span style="color:#65737e;"># Like with the CA keypair you should use a strong password for </span><span> </span><span style="color:#65737e;"># added security. </span><span style="color:#bf616a;">$</span><span> mkdir signed_users </span><span style="color:#bf616a;">$</span><span> cd signed_users </span><span style="color:#bf616a;">$</span><span> cp </span><span style="color:#bf616a;">~</span><span>/.ssh/id_rsa.pub user.pub </span><span style="color:#bf616a;">$</span><span> ssh-keygen</span><span style="color:#bf616a;"> -s ~</span><span>/.ssh/ca/ca</span><span style="color:#bf616a;"> -I </span><span>&quot;</span><span style="color:#a3be8c;">Comment</span><span>&quot;</span><span style="color:#bf616a;"> -n </span><span>&quot;</span><span style="color:#a3be8c;">user</span><span>&quot; user.pub </span><span> </span><span style="color:#bf616a;">Enter</span><span> passphrase: </span><span> </span><span style="color:#65737e;"># Enter the passphrase you used to create the ca keypair </span><span> </span><span style="color:#65737e;"># You can add more than one user by comma separating them </span><span> </span><span style="color:#65737e;"># You will only be allowed to login as the users specified </span><span> </span><span style="color:#65737e;"># in this list and it is required </span><span style="color:#bf616a;">$</span><span> chmod 600 user-cert.pub </span><span> </span><span style="color:#65737e;"># Set the correct permissions on your signed key </span><span style="color:#bf616a;">$</span><span> ls </span><span> </span><span style="color:#bf616a;">user.pub</span><span> user-cert.pub </span><span style="color:#bf616a;">$</span><span> cp user-cert.pub </span><span style="color:#bf616a;">~</span><span>/.ssh/id_rsa-cert.pub </span></code></pre> <p>You should keep a copy of <em>user</em>.pub safe because if you ever want to revoke a users trusted status you need to know their public key.</p> <p>You can easily repeat this for each machine you use regularly.</p> <p>If you use an <a href="http://www.ansible.com">automation tool</a> to set up new machines it is simple to set up a task that will create and sign a keypair allowing you to log in to any of your machines from it straight away.</p> <p>There are many <a href="http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/ssh-keygen.1">other options</a> you can add to the signed key to restrict and secure the user further.</p> <p>Some examples are:</p> <ul> <li>Setting an expiration on the signed key</li> <li>Only allowing the key to be used from certain address's</li> <li>Forcing a specific command to be run instead of a shell when using the certificate</li> </ul> <h2 id="revoking-users">Revoking users</h2> <p>If you lose access to a set of user keys it is fairly easy to revoke their access to your machines.</p> <pre data-lang="bash" style="background-color:#2b303b;color:#c0c5ce;" class="language-bash "><code class="language-bash" data-lang="bash"><span style="color:#bf616a;">$</span><span> ssh [email protected] &quot;</span><span style="color:#a3be8c;">[ -f /etc/ssh/ssh_revoked_keys ] || sudo install -m 644 -o root -g root /etc/ssh/ssh_revoked_keys</span><span>&quot; </span><span style="color:#bf616a;">$</span><span> cat user_to_revoke.pub | </span><span style="color:#bf616a;">ssh</span><span> [email protected] &quot;</span><span style="color:#a3be8c;">sudo tee -a /etc/ssh/ssh_revoked_keys</span><span>&quot; </span><span style="color:#bf616a;">$</span><span> ssh [email protected] &quot;</span><span style="color:#a3be8c;">sudo sed -i &#39;/RevokedKeys/d&#39; /etc/ssh/sshd_config</span><span>&quot; </span><span style="color:#bf616a;">$</span><span> ssh [email protected] &quot;</span><span style="color:#a3be8c;">echo &#39;RevokedKeys /etc/ssh/ssh_revoked_keys&#39; | sudo tee -a /etc/ssh/sshd_config</span><span>&quot; </span><span style="color:#bf616a;">$</span><span> ssh [email protected] &quot;</span><span style="color:#a3be8c;">sudo /etc/init.d/ssh restart</span><span>&quot; </span><span> </span><span style="color:#65737e;"># Again this requires passwordless sudo </span></code></pre> <p>This does have to be run on each of your machines but it is still less work than combing through authorized_keys files for each user account on every machine you have.</p> <h2 id="final-notes">Final notes.</h2> <ul> <li>Keep you CA private key safe.<br /> This can be used to sign new keys that can then access your machines.</li> <li>Use passphrase's and a SSH agent.<br /> These improve security so that an attacker must have access to a key and know the passphrase before using it. Using an SSH agent means you only have to input your passphrase once across machine reboots.</li></li> <li>You should try to follow the <a href="http://en.wikipedia.org/wiki/Principle_of_least_privilege">principle of least privilege</a> for user accounts and give each one their own signed key.</li> </ul>